This September StageZero visited the Conversational AI Summit London, a two-day summit for key players in the conversational AI ecosystem to discuss the future of conversational AI across principal industries such as e-commerce, telecoms, and banking. Apart from the high of being back at face-to-face events again, we had the opportunity to speak face-to-face with the stars of the conversational AI industry, including key players from Meta, Bosch, Amazon, Google, the BBC and more. Here we share the top 5 key insights we came away with, and what they mean for the future of conversational AI.
The principal challenge for conversational AI comes down to handling languages other than English. Most of the presentations over the two days focused or mentioned this as a key problem that even enterprise-sized players are struggling to handle successfully. The main reason for this struggle comes down to a distinct lack of data for languages other than English. While data for lower resource languages is scarce, the market demand is remarkably high. One presenter cited receiving tens of emails consistently on a weekly basis for products in languages other than English, and they’re incapable of satisfying this market demand due to the lack of data availability in such languages. This sentiment was echoed from other key players at the event who are also struggling, not only with the quantity of available data, but also with the quality of the small amounts of data that is out there.
As the market continues to grow and the benefits of conversational AI become increasingly familiar to customers, the customers are becoming more demanding when it comes to localized conversational AI solutions. The key players in this field understand that in order to satisfy the market, they’ll need to secure a high volume of varied and reliable data across a plethora of lower resource languages. They understand this because their customers remind them about it on a daily basis. The resounding theme from the enterprise presentations was one of preparation for this demand, since it is expected to grow dramatically. Therefore, industry leaders are already securing solutions for obtaining quality data especially for European languages such as Romanian, German, Italian… One presenter explained that their chatbot is being extended now into 16 more languages, and their bots must understand at least 80% of their customers’ speech in order to be considered functional. This requires a high amount of quality data, and it’s estimated that demand for such data will increase drastically over the next few years.
What does this mean for the future of conversational AI? As the market develops quickly, more and more solutions will be available and more usable in localized languages across the EU, extending benefits which are currently restricted to English language out to the rest of Europe via applications in customers’ native languages and dialects. This will allow industry leaders to tap into an eager and underserved market and ensure high ROI on their projects.
While the old challenge of combining letters and numbers in English has been mostly overcome, this challenge is still running strong when it comes to other languages. Most of the companies presenting at the summit deal with some sorts of codes regularly and this provides an ideal nightmare scenario for their conversational AI implementation outside of English.
Such codes can be for example a post code or a customer identification number, or even a combination of letters and numbers found in addresses (like “apartment 25 A, 62 Acorn Street”), and are used in almost all of their automated conversations. Critically, such codes are often used for customers to authenticate their identification at the very beginning of their call – therefore successful implementation is crucial.
For the success of their conversational AI project, it is critical for companies to be able to handle such codes and combinations accurately and quickly. Today, this is not a possibility for the majority of the key players in the industry – even the trailblazers are banging their heads about this one. They’re spending unnecessary amounts of their project budget and wasting time and energy on testing and retesting such issues, while the solution is relatively simple, and comes down again to the availability and quality of training data.
Since this data is difficult to source for languages other than English, efficient training of the conversational bot is tricky indeed. As it happens, this is exactly the problem we solve at StageZero, by sourcing such data quickly and at low cost. Contact us to find out more.
Interest in the room peaked during the presentations about more realistic sounding conversations, and the progress made here recently. The improvements have been ground-breaking and are set to continue at least at this momentum.
From a user perspective, the principal benefit of large language models (LLM) in conversational AI is the ability of the machine to make the conversation feel much more natural to the user. Other technology such as Google’s WaveNet neural network allows for smoother audio processing and even using speech disfluencies to create more realistic sounding voices.
The advances made in such technologies have enabled trends like taking a step away from a robotic-sounding voice, towards a more natural human-sounding one. The market seems excited about the potential here. Companies are designing personas for their bots, and selecting carefully curated voices to match the persona. This can involve choosing a gender, an accent, and even a specific vocabulary to match the persona of the bot, leading to a conversational experience that more closely resembles that of speaking with a real human. A couple of industry leaders showcased bots with distinct types of voices for different situations such as “newscaster”, “storytelling,” and “customer service”.
Google Duplex is a prime example of how this can be taken to the next level, with their bot using disfluencies like fillers such as “uhm” and “umm” matched with a voice tempo that is more suited to that of a real human. Examples of Duplex went viral already in 2018 when the bot was used to call companies to verify their covid-era opening hours, and listeners were surprised by the realistic feel to the conversations.
In parallel to the amazement came concern about the ethics surrounding such technology, and these concerns persist today. Day two of the summit saw a panel discussion exploring questions around the psychological impact of using a realistic voice and persona, and what this holds for the future of humanity. Largely it was agreed that the bot should inform the human that it is a bot. Most of the participants in the room said they felt uneasy at the prospect of discussing with a human-sounding bot, but that in the next five years they would probably get used to it. Companies such as the BBC were using human-sounding bots already, and they didn’t always specify that it was a bot – however, crucially, their bot was used for text-to-speech rather than for conversational interactions, which impacted people’s perception significantly.
Overall, there seemed to be an eagerness to explore the potential of realistic conversations and the technologies related to their implementation, while keeping privacy and ethics at the forefront of the conversation.
Basic questions and answers are pretty well covered nowadays, especially in English. But as users become more demanding, we need something bigger to wow them – and multi-modal conversational AI holds the key. Multimodal brings conversational AI beyond the basics and incorporates intent, context, and personalization into the conversation, resulting in a more natural and empathetic conversation for the user.
Sounds futuristic, right? But the industry leaders are already there, and their bots are ready to participate in discussions with users on a level you might not have expected at this stage. During the summit we watched demonstrations of conversational bots taking turns in conversations with humans, even in multi-speaker conversations. Systematically the bot demonstrated implicit understanding of who was talking to whom.
Such systems rely on a compilation of different AI systems working in harmony to produce unified output. Computer vision studies the body language of the human participants to better understand who is talking to whom. High volumes of speech data allow clearer understanding of who is speaking to whom, at what time the expected response should come, and what it should be.
Not only that, but emotion AI is also clearly on the rise. Companies demonstrated their bots performing a plethora of tasks, and while the bots mostly empathize through lexicon for now, many leaders explained that they’re starting to teach tone and mirroring to their conversational AI applications too. This will enhance the user experience even further as well as avoid inappropriate responses from the bots. In order to fine-tune such delicate projects, a high amount of sentiment analysis training data is required, so we were happy to validate this market development since that’s exactly where we shine.
Over the two-day summit we noticed a clear and consistent trend in the lack of availability across the board for conversational AI data in low resource languages. This can be attributed to several factors, the obvious one being the sharp growth in the market demand, which is scaling quickly outside of English-speaking countries. As native speakers of low resource languages learn more about the benefits of conversational AI applications, they naturally want to have access themselves. This requires localization of the application, which requires high volumes of good quality training data.
As conversational AI grows as a field, the experts in the field have started to notice patterns and trends in their projects, in their customers’ projects, and in the competitors’ projects, and are particularly interested in roadblocks and how to resolve them. Many of the roadblocks that were presented at the summit can be solved by good quality training data, for example the issue around codes mentioned above. This issue was cited consistently by several key industry players and especially in relation to languages other than English.
The summit provided us with solid validation that our technology is at the forefront of the conversational AI revolution. Indeed, our estimates show that we have the largest network of annotators in the world, which makes it particularly easy for us to solve issues relating to low-resource languages.