- Many are quite pessimistic about the prospects of voice recognition technology, and claim that its inferiority vs. existing interfaces make the privacy trade-off not worth it
- I believe that voice recognition has a bright future, and that the privacy trade-off will become more acceptable as the technology is enhanced and its use cases are expanded
- Technical enhancements will be driven by competitive pressure to seize the intrinsic and strategic value of voice data, and control the ownership of voice channels
- New use cases will come from third-party developers that contribute to voice providers’ open ecosystems
Voice recognition technology is a hot topic these days. Surprisingly though, we don’t seem to witness the irrational exuberance that typically goes hand in hand with the diffusion of new technologies. I would even argue that it’s quite the opposite: most people I have interacted with, seem to believe voice recognition is just a fad.
Typical claims I heard revolve around this technology being inaccurate, useless and threatening privacy. More precisely:
- The current strong growth of adoption is mainly driven by short-lived curiosity and aggressive marketing from voice recognition technology providers (e.g., Cortana defaulted on Windows 10, Google’s and Amazon’s aggressive pricing of their smart speaker lines)
- Voice recognition interfaces are inferior to existing ones for most use cases (e.g., search, messaging, etc.)
- Privacy concerns will hold consumers back, as the privacy trade-off is not worth it for most people due to how limited the use cases are.
All these claims are true, but still – I believe that voice recognition technology has a bright future, and the potential to create tremendous value for those able to control and leverage it.
Here, I’ll look mainly at why I believe that the claims above are not incompatible with generalized adoption of voice technology, and how competition and open ecosystems will drive innovation in the field.
Limited functionality is part of the process
Skeptics’ claims are compelling because they are intuitive and get a lot of things right. Yes, it is true that voice recognition technology currently performs well on just a few use cases. It is also true that adoption is largely driven by aggressive marketing. and even more true are the public’s privacy concerns.
However, there is a key point that they are missing. Innovation always starts with a few use cases, and typically involves unbalanced trade-offs that only very few early adopters are willing to take at first.1
In other words, the fact that voice recognition technology under-performs existing interfaces for most use cases is not an anomaly that suggests the technology is heading nowhere – it’s a normal part of the innovation process.
Think of current voice technology interfaces as the equivalent of the first WAP mobile browsers2 They were inferior in all aspects to desktop browsers: slow, expensive and uncomfortable. Just like voice recognition technology today, WAP browsers only made sense for very few use cases (e.g., checking your emails when no computer was around). 15 years down the line, the technology has evolved, and I now quite often find myself preferring to use my phone over a desktop to browse the internet.
I believe that the exact same pattern will repeat with voice technology. As the technology improves, it will become more useful and more prevalent. In concrete terms, consumers are now mainly interested in using voice recognition in their cars (cf. figure below). In the future, demand will rise for other situations as new use cases develop, and consumers become used to the technology. Hotels and offices seem like the most immediate expansion areas, with Marriott piloting Alexa in select hotel rooms, and Amazon rolling out Alexa for Business.
Finally, a big part of the equation that we, Westerners, tend to miss is how convenient these interfaces are to people using other forms of alphabets. China is the most striking example, with adoption up 1,500% this year, and forecasted to grow at a 220% CAGR17-23.
Competition will drive technological enhancements
All the statements I’ve made in the previous section rely on the assumption that voice recognition technology will get better over time and expand to new use cases. I am confident in this assumption. More precisely, I believe that enhancements will mainly be driven by competitive pressure, and that new use cases will be generated thanks to the openness of voice ecosystems.
Competition is driven by three factors:
- The intrinsic value of the voice data and ownership of voice-activated channels
- The strategic value of these same items for companies that rely on them to capture future growth (e.g., Amazon’s plans on using Alexa as a way to ramp-up its advertising business)
- The winner-take-all characteristic of the voice recognition technology market.
The best way to grasp the intrinsic value of the voice data and the importance of owning voice-activated communication channels, is to think of this as a new opportunity to seize the search market. Voice data carries similar value in that it can be used for the exact same purpose – better understand customers’ needs to tailor their experience and push them targeted ads. Similarly, ownership of the channel means having direct access to consumers, which can be both used by the owner and resold in the form of advertisements.
This has strategic value for each of the Big Tech players. For Google, succeeding is key to expand and protect its search empire. For Amazon, victory is all about unlocking new growth in the advertising business and securing a key e-commerce channel. For Apple, it’s a probably more tied to building additional software differentiation factors that could drive hardware purchases and drive advertising revenues at the same time.3
Finally, competitive intensity is heightened by the fact that voice recognition technology has the typical profile of a winner-take-all market. That is, the most used provider can reap benefits that will drive its differentiation further. In that case the main benefit is the access to more voice input than the others, which refines the accuracy and relevance of the voice recognition technology.
There in an importance nuance though: each language is a new market. Just like it was the case with social media, winners in China might be different from winners in France or in the US.
Open platforms will drive use case expansion
We’ve just described a common scenario, where competition drives technological enhancement. A less common scenario, is also at play when it comes to voice recognition technology: collaboration-driven technological enhancement, where external developers contribute to differentiating a provider from the rest (think 3rd party apps on the Apple Store).
All major providers that I am aware of have released SDK of sorts, and hope to on-board as many developers as possible to build exciting, differentiating, apps for their platform. Amazon currently leads the way, with over 25,000 skills (i.e., apps) available – way ahead of Google Assistant (c. 2000) and Cortana (c. 250). Volume is not all that matters though. Quality and relevance also do.
This bit will be very exciting to watch as this is where the new use cases will emerge.
A few recent examples include predictive shopping lists by Monoprix (French retailer, supported by a consultancy called Artefact)4, voice-activated thermostats by Johnson Controls, Honeywell and Schneider Electric as well as real-time translation devices by Waverly Labs.
Get new articles delivered to your inbox.
Opt-in here to be notified by email as soon as new articles are published. That’s about one email per month, and zero spam.