Despite being around for several decades in different forms, voice assistants have emerged into the technology scene in a much more advanced state in recent years, with major breakthroughs since the 2010s.  

Fast forward a decade, voice recognition is now the celebrity of the tech industry, with experts foreseeing that the market will touch almost $30 billion by 2026 from $10.7 billion in 2020. According to a Google report, 27% of the global online population is using voice search on their mobile phones. With the growing popularity of this technology, it is undeniable that businesses should pay close attention to voice recognition software, such as voice assistants. 

So how do voice assistants work and what are the benefits and challenges that come with them? How to overcome the challenges? What is a voice command and a skill command? How do you build your own solution? What voice assistant services are there and for what languages? What is the future of voice assistants? Look no further than our guide. 

What are voice assistants?  

Maybe you’re already familiar with Siri by Apple, Alexa by Amazon, or Google Assistant. These are all examples of voice assistants. Voice assistants are digital assistants that rely on voice recognition technology to provide information or perform tasks commanded by humans, who simply have to interact verbally with these assistants.  

man wearing glasses giving command to voice assistant on his smartphone

The revolution of voice assistants emerged in the 2010s, together with the launching of smartphones and smart speakers, which are all interconnected. For example, in 2011, Apple’s Siri was launched together with the iPhone 4s; followed by the introduction of Amazon’s Alexa and Amazon Echo smart speaker in 2014; and in 2016, Google Assistant and Google Home smart speaker. 

The voice assistants in the market all have their own strengths and weaknesses, but in general, they all carry out a plethora of tasks – typical examples include dialing and texting a contact, setting up appointments and reminders, answering questions, managing smart home devices, and playing music. 

Fast forward to today, and voice assistants have become an increasingly common technology for human and machine interaction. They’ve expanded to a wider range of devices, such as smart TVs, watches, cars, and fridges, and with the help of artificial intelligence (AI), voice assistants make our everyday life easier. 

How do voice assistants work? 

Most of the voice assistants consist of four different components: Automatic Speech Recognition (ASR), Natural Language Processing (NLP), software assistants, and text-to-speech (TTS) technology. 

When a user gives a command or asks a question to a voice assistant, ASR enables the machine to detect and translate their sentence from speech to text. 

It is then NLP’s turn to help the system recognize the intent of the speech, including the meaning and the context. 

After understanding the words and the intention, the software assistant processes different commands or intents and the machine can now respond. 

Finally, TTS converts this response from written to spoken form, allowing the voice assistant to talk and to respond to what we are asking. All the steps take just a few seconds, and results are available almost immediately.  

Types of voice assistants 

There are two types of voice assistants: conversational, and command based. Examples of the conversational type are, again, Siri or Alexa, where we can have conversations with the AI, which mimics human interaction. This is what we’ve been familiar with from movies and real life: we have a conversation with the AI system, and it instantly answers the questions we ask.  

The other, command-based voice assistants don’t allow conversation. Instead, you just tell the ‘assistants’ what to do and they execute what you want!   

Language support in voice assistant services 

Apple’s Siri 

21 languages: Arabic; Cantonese; Danish; Dutch; Finnish; English; French; German; Hebrew; Italian; Japanese; Korean; Malay; Mandarin; Norwegian; Portuguese (Brazil); Russian; Spanish; Swedish; Thai; Turkish 

Amazon’s Alexa 

8 languages: English; French; German; Hindi; Italian; Japanese; Portuguese (Brazilian); Spanish 

Google Assistant 

12 languages: Danish; Dutch; English; French; German; Hindi; Italian; Japanese; Korean; Norwegian; Spanish; Swedish 

languages offered by voice assistants

How to integrate a voice assistant to your business 

Motivated to implement a voice assistant to your business? We recommend the following steps to ensure success. 

Step 1. Plan ahead: 

It’s important to define your goals right from the start of your voice assistant implementation. Your goals should be as clear and specific as possible. Try answering all these questions: 

Step 2. Allocate resources: 

You planned for resources in the planning stage, and now it’s time to allocate them accordingly. Communicate the project to all stakeholders clearly, and ensure their roles are clear. We recommend an initial kick-off meeting to communicate roles, responsibilities, expectations and deadlines. Ensure all tasks are covered and understood, and that suitable technical resources are also allocated.   

Step 3. Select a voice assistant platform: 

Do your research thoroughly and choose a voice assistant platform that is suitable for your business needs. Popular options include Apple Siri, Google Assistant, and Amazon Alexa. Consider elements like platform features, compatibility with your existing systems, and developer resources available. 

Step 4. Create your Voice User Interface (VUI):  

Design a user-friendly and intuitive voice user interface that meets customer expectations and aligns with your brand. Take into account the voice tone, personality, and conversational flow of the assistant to ensure a positive user experience. 

Step 5. Develop custom skills:  

If necessary, develop custom skills for your voice assistant to execute certain tasks related to your business. For example, if you have an e-commerce store, you might want the assistant to provide product information, process orders, or answer customer inquiries. 

Step 6. Integrate with existing systems:  

Ensure seamless integration with your existing systems and processes. This can include linking the voice assistant to your CRM, e-commerce platform, inventory management system, or any other relevant business tools. APIs and integration tools provided by the voice assistant platform can assist with this step. 

Step 7. Test and iterate:  

Thoroughly test the voice assistant's functionality, accuracy, and user experience. Gather feedback from testers and users to identify any areas for improvement. Iterate and refine the voice assistant based on user feedback and evolving business needs. 

Read more: The ultimate guide to implementing Conversational AI in your business

Benefits of voice assistants 

Interesting fact: in 2020, 58% of consumers used voice search to discover local businesses, 46% among which do that daily. This proves that voice assistants benefit businesses significantly through helping customers during their buying journey. Let’s find out what these benefits are! 

Refined customer service: 

Voice assistants can make customer experiences quick, easy, convenient, and more personalized. They are smart enough to pick up from previous conversations and come up with personalized resolutions that are the most suitable for the customer. 

Customers can speak to the installed voice assistant just as they would with a human customer representative. With the power of AI, voice assistants offer customers immediate answers and solve their inquiries, without any human intervention.  

A quicker and smarter customer experience makes customers more likely to recommend to others and to come back as a returning customer. 

voice assistants used in customer service

Boosted conversions: 

A rapid and easy customer experience often leads to a successful conversation to sales. For example, quick and convenient product consultation provided by a voice bot can help an online customer to proceed to check-out smoothly, rather than keeping the products still in their cart.  

Reduced costs: 

Using a voice assistant helps companies save money on staffing expenses due to decreased demand for manpower. Therefore, the implementation of a voice assistant is an important decision to consider for your long-term business plan. 

Enhanced work efficiency: 

Time is money, and voice assistants help businesses to save that. Talking and listening to a voice assistant is of course quicker and more efficient than typing out questions and reading responses from a customer representative or even a chatbot. 

A reduced handling time means improved work efficiency. With a voice assistant taking care of a large number of customer inquiries, your human employees can focus on more important tasks that really require their involvement.  

Read more: How to develop a good chatbot

Streamlined operations: 

Not just customer inquiries, voice assistants can also automatically handle emails, reports, reminders, meetings, and many other mundane tasks instantaneously and efficiently. 

Voice assistants are an example of digital technology that never stops working. They assist businesses with management of reports, data, and systems, creating a smooth day-to-day operation flow that is supervised constantly.  

Challenges in implementing voice assistants for business 

Data security and privacy: 

Now that smart technologies and devices are becoming a part of our every life, data privacy becomes a crucial concern. It is important to protect data privacy, especially in the case of voice assistants who are always in close contact with your customers. Keeping data and confidential information secure is one of the most critical concerns, especially in some specific sectors, such as banking and finance. 

It is a priority for businesses to protect customers’ data and hence, they need to ensure that their voice assistants only collect and process essential data. Therefore, to prepare for the implementation of a voice assistant, businesses need to carefully check and comply with all privacy requirements and data protection applicable in their country, for example: the General Data Protection Regulation (GDPR)

See more: StageZero's checklist to ensuring privacy compliance globally 

GDPR data privacy

Finding the right AI development partner: 

AI development is an extremely complex domain of technology that requires dealing with progressive algorithms and highly skilled expertise to run them.  

Despite the outstanding advancement of AI services, it is not that easy to find skilled, experienced, and reliable field experts who specialize in setting up machine learning systems.  

StageZero’s AI adoption in Europe 2022 report states that 56% of high-performing companies partner with data providers who collect real-world data. This factor of data acquirement distinguishes the high performers from the ones who are not so much ahead of the competition.   

Expenses and duration: 

Costs and deployment can also be a challenge when it comes to building a voice assistant.  

Adopting an ASR system requires a long-term vision that prepares how to manage resources, capital, and time involved in the implementation of the system.  

Training language models for a voice assistant can take a significant amount of time and proficiency. Acquiring an ample amount of language resources or effectively utilizing the existing ones can be quite expensive. Overall, the process of manual development would put a significant strain on your finances. 

Lack of language training: 

Scarcity of multilingual knowledge is a big challenge for developing voice assistants.  

Most existing NLP innovations in the world so far have a strict focus on English as a language. As there are over 7,100 languages in the world, it’s extremely challenging to develop NLP for all of them. Since English is not a universal language and hence not used fluently by all global users, 38% of users are reluctant to employ voice technology due to this AI’s language coverage issue. 

woman talking in many languages to a voice assistant
woman talking in many languages to a voice assistant

If your business is attempting to implement a voice assistant in a specific location, the ASR will easily face failure if it’s not trained on the specific language models of that location or region. Even if it’s trained for the language, the ASR will face another roadblock, which is the task of distinguishing between different accents and dialects to achieve precise interpretation. For example, the optimum goal would be that a voice assistant understands even when a user gives a command using their accent and/or dialect, rather than only the common, more widely ‘accepted’ version which AI is more familiar with. 

Read more: Multilingual Natural Language Processing: solutions to challenges

How to overcome challenges 

Define capabilities: 

Businesses need to decide on what the capabilities should be for their voice assistant. Should it be able to recognize different speakers, or catered to just one person at a time? Having a clear decision from the start will help a lot with the implementation process later. 

Build a minimum viable product: 

Constructing a minimum viable product using existing technologies can be a good strategy. Several cloud providers including Microsoft and Google have APIs available for text-to-speech and speech-to-text - that can then be linked to large language models (LLMs)

Fine-tune and test data: 

Fine-tuning and testing of data can be acquired from either actual users or from a data partner. It can save you a substantial amount of time to obtain data from a partner – we are talking about getting data off the shelf instantly compared to 6-12 months. 

Read more: The importance of data in voice assistant development 

Outsource a data partner:  

Data security and privacy will be less of a concern if you work with a partner for data, they will handle the data privacy part when training and testing. However, later you will need to make sure data is stored and processed according to privacy regulations in the regions that your solution is available. For that you will likely need to consult a lawyer, however, you can shortcut that initially by following our checklist for privacy.

StageZero Technologies as your AI data partner

Allocate a monthly budget: 

Running machine learning models has a monthly cost, which is made up of the cloud resources that you use to run it which is usually CPUs and GPUs for the different components. Different cloud providers have different pricing models for their algorithms usually listed on their websites.   

More data can help with language training: 

You will notice that your solution will not work in all cases and languages, at that point you will need to determine whether the performance is good enough or whether you need more data as getting more data is the most reliable way of improving performance. You can see our list of suggested data partners here

Read more: Collecting data for NLP models: what you need to be aware of and Real-world vs. synthetic data: Which to use in AI model training?

The future of voice assistants 

Currently, voice assistants are limited to specific functions. In the not-so-distant future, thanks to large language models, they will be able to assist with virtually any task. We see this as an inflection point when everyone will start using voice assistants and estimate that this is less than 12 months away. In other words, if you have had plans for developing your own solution or add-on to a voice assistant, now is really the time to start. 

Increasing integration: 

It is likely that voice assistants can become more integrated into numerous tools and environments. They are likely to be present in more types of appliances, such as wearable devices, making them even more accessible and prevalent in consumers’ daily lives. 

Upgraded NLP: 

Improvements in NLP will potentially advance the capabilities of future voice assistants. Very likely, they will be better at comprehending context as well as solving complex commands, and able to take part in more meaningful and natural conversations. This means voice assistants will be more valuable and helpful.  

Growing functionalities: 

In the future, voice assistants will continue expanding their functions beyond current basic tasks. They will be upgraded to be able to handle more complex tasks, such as home automation, online shopping, health monitoring, and offering personalized recommendations across different domains. 

Personalized user experiences: 

We can expect voice assistants to become more adaptive and personalized to individual users. They will learn more from various factors such as user preferences, customer interactions, and behavior patterns to offer customized recommendations and experiences. This degree of personalization will stimulate deeper engagement that contributes to a smoother and more seamless user experience. 

Multilingual and cultural adjustment: 

Expectedly, voice assistants will become more capable in numerous languages and can adapt themselves to different cultural settings. This function will initiate new markets and generate wider adoption globally, making them more accessible and inclusive to diverse communities. 

If this piques your curiosity, then please get in touch to discuss your project requirements with us.

Keep up to date with the latest news from the forefront of AI! Subscribe to our newsletter and follow us on LinkedIn. 

As speech analytics technology continues to evolve, the future is looking promising. Artificial intelligence (AI) and Natural Language Processing (NLP) are becoming increasingly important in the fields of customer experience and data-driven decision-making, fueling the adoption of speech analytics across multiple industries and use cases. As the potential of speech analytics becomes clearer to enterprises, their investments and adoption increase, fueling further evolution of an already powerful technology.  

Today we investigate our top 5 predictions of what the future might hold for speech analytics. 

1. Exciting new capabilities to fuel decision-making

The development of more advanced capabilities in the fields of AI – and in particular, NLP will enable higher accuracy combined with a more sophisticated analysis of speech data. This will lead to more valuable insights for users and enable them to make more effective data-driven decisions. 

Real-time analysis is a potential capability we could see hitting the scene thanks to greater processing power becoming available. As computing power increases, real-time speech analytics will be able to process greater quantities of speech data and make increasingly accurate predictions in live time. This will allow for improvements for example regarding critical emergency response use cases where immediate action will be taken based on the real-time conversations. We predict we’ll be able to recognise emotional states more accurately in live time too, meaning call agents will be able to provide immediate support and interventions in mental health crises. Overall, we expect real-time analytics to increase efficiency across the board also when it comes to basic call re-routing, transcriptions, and reducing the workload on human operators. 

a woman giving voice assistant a voice command using her smartphone

As globalization continues, the capacity to handle speech in multiple different languages will become more important for enterprises and their customers. Today the main languages of the world are relatively well-served, but we expect to see similar growth soon for other languages. Customers are becoming more demanding and competition on the market is tough. As enterprises search for new ways to differentiate and serve their customers more appropriately, speech analytics will have to be available in local languages around the world. By analyzing speech in multiple languages, speech analytics tools will improve accuracy and provide more reliable insights on how different demographics interact with enterprises, allowing the enterprises to make strategic decisions that work on a more targeted level. It will allow a greater diversity of contexts and reach a wider audience, serving more people. 

In turn we predict this will lead to more culturally appropriate deployment of speech analytics tools. Speech analytics tools that can analyze speech in multiple languages will be able to provide insights into cultural nuances and variations in communication. This will help to bridge the gaps in communication that we still see today. Even better would be if we could imagine the tools would be able to provide insights into global cultural trends and patterns of communication. This would enable enterprises to make better-informed decisions about their operations and their global strategic directions overall. 

2. Improved accuracy and reliability

Accuracy is already a crucial aspect of speech analytics, and we expect to see significant improvements on this, which in turn will make the tools more reliable and provide knock-on benefits. 

Insights from speech analytics tools can be expected to become more reliable and cohesive, providing information about more aspects of the speech data. We expect to see insights about the content of the speech, the meaning, and the context of the speech data. This will prove particularly important for us to be able to advance in fields like mental health, where accurate interpretation of the spoken language is critical. However, we expect it also to provide more valuable insights to other fields too such as customer service. As multiple new languages integrate to different speech analytics tools, we’ll see a wealth of improved insights to new markets and cultures, new customer bases and ecosystems. 

We expect efficiency to skyrocket thanks to such tools. The market is already buzzing as employees and customers start to understand more about conversational AI in particular and promises of efficiency have been a key driver of the hype. We predict the technology to deliver on its promises X-fold, even in ways we can’t yet imagine. The workload on the shoulders of human agents will be greatly alleviated, call handling times reduced drastically, and heavy strategic decision-making processes will lighten. Organizational processes will improve thanks to decisions based on data that consider multiple aspects of the business functions and that understand more deeply the needs of the customers and the employees.  

Indeed, as the data provides deeper understanding to the customers’ needs, the overall customer experience will improve too. Already we see enterprises using speech analytics to find new ways to enhance the customer experience, and we expect to see this grow continuously over the next few years at least. The improvements in speech analytics will provide more accurate, and importantly, more relevant insights into interactions between the enterprises and the customers, which will allow enterprises to make efficient and targeted improvements to every step of the customer journey. We foresee that more personalised services will start to crop up, not only in terms of the languages served but also in terms of the processing of calls, tone of response, and other such improvements. 

Read more: Ultimate guide to implementing Conversational AI in your business

3. Increased focus on emotional intelligence

Generally speaking, “emotional intelligence” refers to a person’s capacity to understand, manage and express emotions effectively. In the context of conversational AI, emotional intelligence can play a significant role in enhancing communication, as the person will have the feeling they’re communicating with someone who understands them and their emotions.  

Emotional intelligence will most likely play an important role in the future of speech analytics by improving the accuracy, personalization, empathy, and overall effectiveness across a host of applications, industries and use cases. We expect speech analytics to involve analysis of emotions during customer interactions as standard, which in turn should lead to higher perceived empathy and an improved customer-centric approach to customer service.  

sentiment analysis emotion intelligence artificial intelligence

As accuracy in emotion detection takes off, the technologies could be extended to new industries such as health care, especially with a focus on the currently underserved mental health sector. We expect to see emotional intelligence used for training speech analytics algorithms to detect emotions with higher precision, leading to more reliable insights and more targeted action, for example, regarding mental health interventions. Services will be more personalized – by analyzing cues such as tone, pitch, and volume, speech analytics tools will provide more personalized services to improve customer and patient experience and business outcomes. 

As we see evolution in more nuanced communications, emotional intelligence will grow over areas such as understanding cultural differences. This will enable enterprises to serve a wider range of customer demographics more accurately and with a vastly improved customer experience. We also expect more accurate recognition of sarcasm and irony. This will lead to more effective communication with less instances of misunderstanding, especially in sensitive situations.  

Overall, we can expect emotional intelligence to lead to higher levels of empathy when it comes to decision-making. When an enterprise has access to more information regarding the emotions of the customer, it can make better decisions to ensure the emotions remain positive throughout the entire customer journey with them – in turn, leading to better business outcomes. 

4. Emphasis on privacy, security, and ethical considerations

As a wider recognition of data privacy and security takes hold, speech analytics providers will be expected to invest more heavily in ensuring regulatory compliance and protecting sensitive customer information. This will involve a deeper focus on privacy and security regulations as well as more careful considerations of the ethics involved in speech analytics. 

Informed consent will continue to be considered a bare minimal legal necessity, meaning that consent will be required from those whose speech is analyzed. They should be informed clearly on the purpose and scope of the analysis, and they must retain the right to opt-out of the analysis at any time. To protect an individual’s privacy further, speech analytics tools can anonymize the data being analyzed. This means that any personally identifiable information (PII) such as names, addresses, phone numbers, and dates of birth will be automatically removed from the data before any analysis.  

GDPR

Data security will also become stricter, and tools will be designed to ensure that all data analyzed is secure. The data needs to be protected from any unauthorized access internally or externally and must not be used or disclosed in any unauthorized way. Such protocol can involve procedures such as encryption of the data to ensure it is secure in compliance with laws such as the General Data Protection Regulation (GDPR) in the European Union or the California Consumer Privacy Act (CCPA) in the United States. Furthermore, limiting the data retention time can help to minimize the risk of data breaches or unauthorized access. The data should be retained only for as long as is necessary to achieve the intended purpose, then should be disposed of properly – and we don’t foresee or advise any changes to this practice.  

On top of the legal considerations, ethical considerations will remain a hot topic. As mentioned, data security is already a major concern for speech analytics, and as the technology becomes more sophisticated, and data collection becomes increasingly widespread, the risk of privacy violation also increases. Enterprises should ensure compliance to avoid not only the legal issues but also any ethical implications relating to breaches. Transparency along with the aforementioned informed consent is advisable. Enterprises should be transparent about their processes to all individuals concerned and should protect against any misuse of the data they process. There is a risk that speech analytics tools could be misused for unethical purposes such as surveillance or manipulation. Enterprises should become increasingly held accountable and should ensure they minimize any risk of unethical misuse of their data.

AI racism and sexism

We also predict that enterprises will be held increasingly accountable for bias and discrimination in the algorithms they use. Like any machine learning algorithm, speech analytics tools can be biased and lead to discrimination. They could reinforce existing biases if they’re not designed and tested with care and expertise. We have seen some famous examples recently of racism and sexism in different fields of AI, such as racism in healthcare and sexism in employment. As speech analytics tools become more widespread, it will become increasingly critical that they’re designed and tested to be fair and unbiased. Using diverse data training sets can help to mitigate biased outcomes and enhance reliability – vigilance is key. 

Read more: Data diversity and why it is important for your AI models and StageZero's checklist to ensuring privacy compliance globally

5. Integration with different technologies and expansion to new industries

We predict that speech analytics will be integrated with other technologies such as chatbots, virtual assistants, predictive analytics and more, and that this will lead to a near seamless customer experience, clearer and more personalized communication, and smoother automation. As communication is improved, so will be decision-making. The customer and the enterprise both will enjoy the benefits across a growing range of industries. While speech analytics has already been widely adopted in industries such as contact centers, we predict growth in other industries such as finance, retail, and hospitality to gain insights to customer behavior and improve business outcomes. Here are the main industries where we expect to see developments. 

Customer service: 

Speech analytics will be used to analyze customer interactions and provide insights to their preferences, pain points and overall satisfaction levels, which can be leveraged to improve customer service and drive customer loyalty. 

speech analytics used in call centers and customer service

Healthcare: 

Analysis of patient-doctor interactions will provide insights to medication adherence and treatment efficacy, which will be used for improving patient outcomes and to reduce the costs of healthcare. Emotion recognition will be implemented to health industries to alleviate the mental health crisis in a cost-efficient way. Tools could be used to analyze speech patterns to identify signs of depression or anxiety, or other mental health concerns, and this information could be used to allocate early intervention and treatment for people at risk. 

Education: 

Student-teacher interactions can be analyzed to provide insights into student engagement, their understanding of topics, and their educational progress. The insights can be used to improve learning outcomes and teacher performance, and even to personalize educational programs. 

Finance: 

Speech analytics tools will be used to analyze financial transactions and customer interactions with financial institutions, providing insights into security around fraud prevention, risk management, as well as customer experience topics such as satisfaction. The insights are expected to reduce fraud and enhance product understanding for enterprises providing financial services. 

Marketing: 

Analyzing customer interactions with marketing campaigns will provide key insights into customer preferences and behaviors, to improve on marketing campaign performance and drive sales. Analytics tools could be used for analyzing social media posts and customer interactions to reveal insights to customer sentiment and behavior, improving social media marketing and customer engagement strategies. 

Legal: 

When it comes to analyzing legal proceedings, technology could provide insights into witness credibility, jury sentiment, and case outcomes. Such information could improve legal proceedings and allow for fairer outcomes. Real-time analysis could be used for allocating immediate law enforcement and security measures, reducing crime, and providing assistance more quickly. 

So, speech analytics has the potential to drive results across many industries, providing insights into customer behavior, patient health, student progress, financial transactions, marketing campaigns and legal proceedings. As speech analytics technology continues to advance, it’s likely that new applications will emerge to support the specific use cases too. We can expect to see new and innovative applications emerging over a variety of industries with the potential to improve customer experience, patient outcomes, employee development and more.  
 


Are you looking for ways to enhance your speech analytics? Contact us to find out about our StageZero Speech Analytics Suite. 

History of speech analytics 

Speech analytics is a hot topic recently, but its long history dates back to 1952 when basic speech recognition first broke on the scene. Bell Laboratories introduced “Audrey” in 1952, a device that could recognize one voice speaking digits 0-9. In 1962, IBM unveiled their “Shoebox” device which could understand the digits plus six other words (minus, plus, subtotal, total, false, and off). But speech analytics as we know it today didn’t really come to fruition until the early 2000’s. 

speech analytics for call centers and customer service

In the early 2000’s the global market saw a sharp uptick in call centers and call center activities, and the enterprises behind them understood that customer experience there was valuable. In order to improve the customer experience, enterprises started to invest in speech analytics as a tool to diagnose key points in customer-facing conversations that could be refined. Initially, this involved basic transcription of calls followed by tagging them into groups depending on various topics. The tags helped people to identify recurring problems and common behavioral patterns during the calls, which helped in turn to re-route the calls to the appropriate agents. This technique proved useful for training new agents as well as for improving the customer experience overall. 

As the market started to see an increasing return on their investments in speech analytics, the technology itself was fueled to higher levels of sophistication. By the mid-2000’s sentiment analysis was added to the mix, allowing enterprises to monitor the emotions humans displayed during customer calls. This resulted in new insights to customer experience, which in turn allowed enterprises to prioritize areas for improvement more easily than before. The success was snowballing and fueled further evolution.  

This evolution continued on a strong path throughout the 2010’s as we witnessed further revolutionary developments in the field of natural language processing. The accuracy of speech analytics grew increasingly impressive, and today it is fully entrenched in the world of customer experience, covering multiple industries and use cases from sales to fraud prevention.  

Roots in voice recognition 

As we saw with examples like Audrey and Shoebox, speech analytics as a discipline has its roots in voice recognition technology. However, these roots are distant and the two are considered today as related, but different technologies. 

Whereas voice recognition (sometimes referred to as speech recognition) converts speech into text, or allows devices to interpret speech, speech analytics analyzes the speech data itself. Voice recognition allows users to interact with machines by using their voice. Some examples of such technology would be the voice assistant on your smartphone, dictation apps that you might use to take notes for you, or devices in your smart home or connected home that you activate and instruct by speaking to it. Speech analytics on the other hand entails monitoring and analysing speech to associate different patterns of behaviour in order to help companies to make modifications in their business processes with the goal of enhancing the customer experience. 

The end-goal of the two technologies is therefore quite different, despite them both dealing with speech. Voice recognition is focusing on converting the speech to a new format to enhance its usability, whereas speech analytics is focusing on gaining insights to customers from conversations in order to prioritize customer experience aspects to improve business outcomes. 

Read more: Where to get speech recognition data for NLP models? and The importance of data in voice assistant development

So what is speech analytics? 

As the name suggests, speech analytics involves analyzing speech data, and this speech can come from recorded speech or live conversations and utterances. Gartner defines speech analytics as software to “enable real-time and postcontact capture and analysis of service and support experience” (Gartner ID G00781108 Gartner Market Guide for Speech Analytics Platforms 22nd March 2023).  

Typically the speech analyzed comes from recorded or live conversations on the phone, from voicemails, and from other business interactions like sales calls and assorted varieties of customer service calls. Machine learning algorithms transcribe the speech and then analyze their content based on the users’ requirements. The larger the pool of data available to the algorithm, the more accurate the outcomes will be. 

speech analytics sound waves

As well as providing a consolidated overview of the customer’s interactions with the enterprise, speech analytics can reveal relevant patterns in the topics of the conversations and the sentiments of the speakers, the frequency of certain words or phrases (known as “indexing”), and the tonality of the speakers. This allows enterprises to understand customers more deeply, and to make data-driven decisions when deducing the overall efficiency of communications. It allows them to monitor the performance of their agents, monitor compliance, experiment with new tactics, and to implement the necessary adjustments with the goal of improving the performance of voice interactions leading to the desired business outcomes. 

Use cases for speech analytics 

The use cases for speech analytics are multiple and span countless industries. Here we will investigate a few of the more common use cases. 

Customer service was one of the initial use cases for this technology. Speech analytics monitors agent-to-customer interactions either in real-time or using “post-call analysis”, meaning that recordings of the interaction are analyzed after the call. Enterprises use this to keep an eye on their call agents to ensure they’re performing their best, and to ensure compliance with regulations. Feedback from speech analytics can assist in exploring new ways to reduce call-handling times, and can identify specific key words or phrases to avoid or include to enhance the outcomes of the calls. 

Similarly, quality assurance is a popular use case since speech analytics technology allows enterprises to evaluate the overall quality of interactions on business calls. Here too, monitoring compliance with regulations is valuable, and this allows enterprises to evaluate whether agents have followed protocol regarding company policies, internal procedures, as well as local laws such as the Health Insurance Portability and Accountability Act of 1996 (“HIPPA”). It can also be used for training and re-training agents. 

Healthcare itself is a popular use case in and of its own, with speech analytics proving to be a useful tool for monitoring interactions with patients and healthcare agents, identifying patterns to enhance care procedures and outcomes, and even spotting new opportunities for improving the patient outcomes in future such cases. It has also proven popular as a tool for identifying new actions to implement in order to enhance patient satisfaction outcomes. 

Security and fraud detection is another use case that is reaping the benefits of speech analytics. Speech analytics can suggest risk rules that enterprises can implement to block, allow, or flag up certain words or phrases, with the goal of identifying and blocking fraud threats before they materialize. It can analyze indicators such as keywords or speech patterns, and based on predefined triggers, systems can alert agents in live time to different levels of fraud threat. This empowers agents to handle such threats earlier in the process, reducing the risk of fraud dramatically. 

speech analytics for security and fraud protection

When it comes to commercial use cases, sales and marketing have long been a common use case for speech analytics. Sales agents can use it to identify the most effective messaging and tactics to use during calls with customers, and marketing can pick up on the key words and phrases that lead to the highest conversion rates, so that they can reuse those in their messaging.  

Finally market research is a popular use case for speech analytics as the technology can be used to gain key insights into the preferences of different markets and demographics. Their basic requirements can be identified as can the frequency of different pain points they experience. Speech analytics makes it quick and easy to handle vast amounts of data on these topics, enabling enterprises to make more accurate decisions more quickly based on the data. 

As speech analytics becomes stronger across new languages and markets, we expect to see it increasingly deployed over more use cases in the near future too. 

Read more: What to consider before starting an AI project in your company?

Benefits of speech analytics

The benefits of speech analytics are tremendous and this is exactly why implementation continues to extend across new industries and use cases.  

Efficiency is increased by automating the entire process from transcription to analytics of speech. This allows companies to make significant savings compared to manual analysis, and the results are far more accurate. Quality assurance is also improved, enhancing efficiency and saving money. Speech analytics enable enterprises to identify which sales tactics are the most impactful, which messages and keywords perform the best. This directly generates greater success rates and higher revenue for the business. 

Performance improves greatly when speech analytic is used as a tool to monitor agent success. It can provide feedback to agents on their communication style, empowering them to identify improvement points, and in turn improving their performance overall. Compliance is a key area of performance that speech analytics improves in particular. It can be used as a reliable tool to ensure compliance with regulatory requirements, internal policies, etc, and this in turn reduces the risk of lawsuits, fines, and other legal problems. 

Finally, speech analytics improves decision making. By providing insights to customer behaviour, needs, and preferences, market trends, and other key factors, speech analytics enables teams to better understand their customers and their markets, and make improved decisions accordingly. In particular, decisions around the customer experience benefit from speech analytics as it directly pinpoints customer pain points, improves call handling (eg re-routing of calls), and can allow the enterprise to identify areas where more personalized solutions might be preferable both for the customer and the business. This improves decision making efficiency and drives business growth. 

person speaking to a smartphone to give voice command to voice assistant

Potential drawbacks of speech analytics

Overall, speech analytics provides significant benefits but it’s important to remain aware of the potential drawbacks and to implement appropriate measures to mitigate them. Most drawbacks come down to basic common sense and would apply to technologies in general, such as concerns surrounding data privacy. Speech analytics can involve sensitive data that might arise for instance during conversations with customers. Enterprises are ultimately responsible for ensuring compliance with all relevant data privacy regulations and protection of any sensitive customer information. 

The actual adoption of the technology can prove problematic, and implementation can be costly. It might entail significant investment upfront with regards to the technology as well as its supporting infrastructure. Integrations with additional systems such as CRMs or marketing automation platforms can be pricey but could be worth it since it would provide a more comprehensive overview of customer interactions. This can present technical challenges and may require additional resources. The success of the implementation also involves adoption of the technology internally, which means staff will need adequate training and support to be able to incorporate the technology smoothly to their workflows.    

Following the adopting of speech analytics, its accuracy is critical for success. Depending on the accent and speech style of the speaker, many speech analytics systems fail to capture or analyze the speech accurately, causing misrepresentative results. Colloquialisms, slang, background noises, stuttering, code-switching and other common factors can impact the accuracy negatively. To mitigate this, it’s important to know for sure that your speech analysis algorithms have been trained on a high quantity of good quality training data. In their 2022 survey on the state of AI in Europe, StageZero found that 92% of companies struggle to access sufficient data for training their algorithms, so this is no small problem. Furthermore, the context of the speech is often missed entirely since the algorithms are picking up on the speech alone, and not the body language, facial expressions or other nonverbal cues, and this can also impact the reliability.  

Overall, enterprises should take care not to rely too heavily on speech analytics, but rather use it as a tool. Human intuition is key, and human judgement is a crucial component of analysing customer interactions. Speech analytics can be a productive tool for providing valuable insights but does not replace the agent – not yet. However, the technology is becoming more and more advanced, and therefore there are ethical concerns to stay aware of moving forward. The use of customer data for targeting advertising is one such area regarding serious ethical consideration. The use of speech analytics has its benefits but enterprises must take steps to ensure they use it responsibly. 

Conclusion 

Based on the benefits and drawbacks, and the current market situation, most enterprises should be able to project positive business outcomes from implementing speech analytics successfully. The true value will depend on the costs of implementation, size of the company, industry and use cases, among other factors. Therefore it’s of course advisable to conduct a thorough cost-benefit analysis to determine the true value on a case-by-case basis. The unlimited potential however is clear. 

If this piques your curiosity, then please get in touch to discuss your project requirements with us.

Keep up to date with the latest news from the forefront of AI! Subscribe to our newsletter and follow us on LinkedIn. 

In March, StageZero presented our first ever webinar to have the discussion on our report “AI Adoption in Europe 2022: How high performers generate value”, the first-ever large-scale survey on AI implementation in Europe.  

Hosted by StageZero CEO and co-founder Dr. Thomas Forss, and Business Development Director Lesley Kiernan, we were proud to host two keynote speakers: Dr. Magnus Westerlund - Principal Lecturer in Information Technology and Director of the Laboratory for Trustworthy AI at Arcada University of Applied Sciences, and Dr. Markus Weber - Senior Ink Technologist at Wacom. 

Full recorded session is available here and the report is available for download here.

Webinar AI adoption in Europe StageZero Technologies Thomas Forss Lesley Kiernan Markus Weber Magnus Westerlund

Lesley Kiernan: What was the inspiration behind the survey? What were you seeking to do and why? 

Dr. Thomas Forss: We were looking around for information for a long time, especially for Europe, that’s our main market. We realized that there’s not many AI data options here. We started looking into how we can do it ourselves.  

We created a survey and we reached out to managers and experts within the field. We don’t really see any limitation or bias in this, but for next year, we’ll try to get more respondents.  

Lesley: Who were the participants that you decided to ask for responses in the survey? Who was the target audience?  

Thomas: The respondents were technologists and leaders, who have had at least 3 years of experience in the field of machine learning and AI, for example lead or senior data scientists, heads of AI, and chief information officer. We used LinkedIn to reach out to our respondents. 

Lesley: Why did you choose Europe specifically? 

Thomas: Europe is interesting because it is different from other regions. Europe is often compared to US, a country that has an advantage which is the fact that it is a big market with two main languages: English and Spanish. Europe, on the other hand, has many small countries with ‘smaller’ languages. In fact, I think in Europe we have more than 25 languages. 

Read more: Multilingual Natural Language Processing: solutions to challenges

Lesley: Let’s go over the key takeaways from the survey.  

Chart 6 Types of value reported in implementation AI adoption in Europe 2022 StageZero Technologies

Thomas: In chart 6, we see the types of value reported in AI implementation by companies, evaluated on a scale from 1 to 5. One of the many interesting things here is that a lot of research or forecasts tell us that most of the AI implementation attempts fail. However, our report shows that companies get value out of them. The average value that companies reported here is quite high, for all categories of types of value. For me, this is quite a pleasant surprise.  

Chart 7 Do you experience a lack of data and data of the right quality AI adoption in Europe 2022 StageZero Technologies

Thomas: 90% of respondents had some issues with having a sufficient amount of data to train algorithms. This high percentage also surprised me. The results show us that not only small companies have this issue, but also across the board. Lack of data is a clear problem. 

Read more: Data diversity and why it is important for your AI models

Chart 11 Would you develop more solutions if you had access to the right data? AI adoption in Europe 2022 StageZero Technologies

Thomas: This is something that I experienced myself before. Usually in companies, there are a lot of new ideas, but quite often, we are limited to the data that is available. The results here perfectly match my experience.  

Chart 19 Organization and approach to AI/ML AI adoption in Europe 2022 StageZero Technologies

Thomas: For this research, we especially tried to find out what high-performing companies focus on that others do not. We found out that at least, when it comes to organization, they are focusing on certain things, including hybrid and centralized style. This concerns how they make decisions and how they spend their budgets.  

Chart 21 How do high performers acquire training data? AI adoption in Europe 2022 StageZero Technologies

Thomas: For this very interesting chart, we asked respondents a few different questions, such as whether they have all the data they need – most do not, and how they acquire their training data. Here you can see that 75% of them get their data from their customers. The next popular answer is that they create training data themselves somehow. But also, over half of these high performers also partner with data providers who collect real-world data. This factor of data acquirement distinguishes the high performers from the ones who are not so much ahead of the competition.  

Lesley: What are the main implications and recommendations that we should take away from this report? 

Thomas: Firstly, AI development is very complicated. It seems that companies tend to focus on their core, meaning building the algorithms themselves, and partner with a third party such as a data provider, or using different types of platforms, for example data platforms or MLOps platforms. Secondly, companies who have a clear AI strategy and a dedicated budget for ML/AI research and development have a higher chance of success or at least more value out of implementations. All the high performers have come quite far with MLOps.  

In our next survey, StageZero plan to extend our research to Asia. Like Europe, this continent has many different countries and languages. I wonder if we’ll see the same results as Europe, but maybe not. We also plan to get more respondents for even more specific segments and more defined results, discuss issues on a more granular level.  

Read more: What to consider before starting an AI project in your company?

Lesley: Moving to our panel discussion with our guest speakers, when we look at the survey results in general, how did they align with your daily life experiences? For example, a lot of the companies are expanding their budget next year for AI implementation, is this something that you recognize in your day-to-day lives? 

Dr. Markus Weber: Yes, obviously we are expanding the budget for AI, especially for cybersecurity. This can be a sensitive topic as cybersecurity threats can cause financial harm to companies. There is increasing investment in cybersecurity, data collection and annotation from our side. But of course, the global crisis is affecting the overall budget and management requires clear understanding of why the expanding budget is needed.  

AI technologies help solve problems that could not be solved previously. At Wacom, we are looking at pen and ink usage, especially for the education market, and trying to solve the problems that this market is having to face. We then need to identify the AI technologies that can solve these problems. When you improve the feasibility and the user experience, you simultaneously improve your product. For example, at Wacom we aim to enhance the value of pen and ink by improving the flow and our pen technologies using AI technology. It’s about improving the end-to-end experience and seeing how AI can contribute. 

Dr. Magnus Westerlund: The direction of research funding is heading to an increase in budget for development within areas of healthcare, humanity, and social sciences. I find it important that we are not only focusing on mere technologies but also on use cases like diagnoses and disease detection. There’s a lot of financing currently available for research in this area – the biggest change I’ve seen in financing is within this area. 

Thomas: Looking at the leads we’ve been receiving at StageZero, I think that companies would be at more risk if they don’t invest in AI, rather than if they do.  

Lesley: One of the main issues which companies with AI budget usually face is that they cannot find sufficient high-quality data. Is this something you also face in your day-to-day work? How do you handle accessing the right type of data? 

Markus: Yes. The challenge when it comes to data collection, the best case would be acquiring representative data from the users you want to support. For this, we need a legal framework, including user consent with full transparency of what you are trying to get from the data, we need to gain trust from the users, so they are willing to share their data. After this, you need a standardized approach to annotate the data cost-effectively. It’s not just about having enough data, but also meaningful and high-quality data to serve your training. High-quality data annotation is also a challenge, especially when dealing with complex data like our ink data. 

Magnus: I fully agree with Markus’ assessment. We had a collaboration with a company, and it turned out that in order to annotate data, we would have to bring in their senior expert for several months and it’s not an efficient way to do machine learning. I think in the research field, we should rethink data annotation and brainstorm on how to create new datasets. Having said this, there are new directions coming and one of them is supervised-learning ML method which is interesting. Recently with large language models, we suddenly have an extreme amount of data, as long as the ML model can train itself then annotation is not needed for every sample. In most cases, we have not had this possibility. In general, the way that we deal with annotation is not efficiently developed and that’s what we should work on.  

Read more: Enterprise AI adoption: Top challenges and solutions to overcome them 

Lesley: There seems to be different challenges around the data itself – almost a quarter of the companies in the survey were using synthetic data to get around different project blockers. Markus, are you using synthetic data at Wacom and what are your thoughts on synthetic data? 

Markus:  Actually there was a running joke in my team because we could never get enough data, so at some point the team started to look at synthetic data. We had small sets on which we used a “ransom note” method, so using small parts of different sets and putting them together like ransom notes. It was really hard to get hold of the right data. It was painful. At one point we tried to collect our own data ourselves, but we were desperate, so we looked at synthetic data. Mixing handwriting from different users doesn’t give the same quality of data (think of pen pressure or hand angle) is painful especially with handwriting, so we started to look at synthetic data.   

Lesley: How is it going for cases like in IoT or edge computing cases? Is synthetic data used in those cases? 

Magnus: To some degree I think there has been a long history of using synthetic data, for example when it comes to network data, network traffic. We’ve been simulating data for a long time in this area. When we look more at sensor data, with sensors that work in a repeatable fashion we can then define the changes in data through statistical means. But then what we have not seen yet… Quite many companies are thinking of how to go forward with this, particularly regarding how to be mindful of privacy issues.  What we have not seen yet is multimodal data, meaning we have many different types of sensors describing the same phenomena, here I have not seen many examples of synthetic data where you’d be able to use it properly. For example, the case of Tesla raised the question whether synthetic data is needed for cars.  

Thomas: Having worked with fields including NLP and computer vision, I would say that all of these areas use some form of synthetic data, the question is about which one uses it more, and speech and audio data seem to be using the least. My guess is that this is due to a lack of technological readiness for creating good enough data that can be used as synthetic data. I was in touch with companies who enhance or change synthetic data to improve their algorithms, similar to Markus’s example of Wacom’s ransom notes. I have never heard of training synthetic data with voice only. 

Read more: Real-world vs. synthetic data: Which to use in AI model training?

Lesley: There are also some questions regarding synthetic data and privacy compliance, seems like one of our respondents’ greatest concerns is data privacy. Magnus you mentioned that that is quite a surprise to you – could you please elaborate a little on your conclusion there? 

Magnus: First of all, perhaps maybe because personal data is very specific. Most companies have no need to link personal information to a specific person. So in that sense I believed the mature companies would know how to do this, and would already manage data privacy and it should not be a major issue. But of course, perhaps they also consider other types of privacy regulations, for example the proposed AI Act that is upcoming.  

Lesley: StageZero has been dealing a lot with issues concerning GDPR, as well as many legislations from different geographies, Thomas what are your thoughts on this? 

Thomas: The hope would be that the world would have one joint legislation, although in the short term it may not be possible. For example, in the US, right now there are different regulations in each state. There are about one hundred regulations globally. Companies do not have the resources to make sure that they follow all of those. 

Lesley: One last question for Markus: As Wacom is an international company with a lot of legislation and there have been some talks that we would end up with about 10 different legislations, is there a possibility that it will really happen? And if so, how would you handle that? 

Markus: Yes, it is painful. To handle this, our internal legal team works with a local partner. There are local lawyer firms who specialize in data legislation for different regions. This goes back to the budget issue because lawyer partners can be very costly.  

Read more: StageZero Technologies’s continued collaboration with Wacom

Lesley: As all of you are from an academic background, what role could, or should European universities have in developing AI implementation in the region? 

Thomas: They should be one of the driving factors of legislation and how we use AI. Trustworthy AI, GDPR and AI Act are applicable to concrete scenarios because I think there’s a risk if only lawyers from the EU draft these. 

Magnus: I fully agree with this, and this is what we’ve been working on intensely.  Over the last year we’ve been doing assessments of high-profile cases, We launched an initiative that includes industry experts and academia. We tried to figure out a solution and see whether it’s trustworthy, and what trustworthiness actually means. This is a core thing. The second thing is it’s necessary to educate people, not only engineers but also businesspeople need to truly understand what AI really means.   

Markus: We work a lot with research. I work with my former research institute on a lot of projects, and we have Master’s students doing their theses at Wacom. We also promote knowledge transfer from external research partners and experts of different areas within the company. It is important for knowledge to be understood by everyone in the company, even for management and shareholders. 

StageZero are AI data specialists, based in Helsinki, Finland. We seek to lead the conversation on valuable and ethical AI implementation in Europe, and to highlight the true value of AI within the European context.  

Did you get the chance to fill out our survey on the State of AI in Europe 2023? We'd love to hear how companies like you are handling your AI implementation. 

Keep up to date with the latest news from the forefront of AI! Subscribe to our newsletter and follow us on LinkedIn. 

One of StageZero’s longest-running partnerships is a collaboration with Wacom – a company that provides cutting-edge digital ink solutions for a wide range of partners who use or produce smartphones, tablets, and digital stationery. By offering innovative products in the education field, Wacom’s digital ink solution helps teachers and students in their daily workflow. In particular, the solution provides teachers with insights that they lack due to the emergence of remote teaching and learning. 

As a partner, StageZero helps Wacom by handling data annotation, using Wacom’s own data model for describing the contents of digital ink. 

For Wacom’s projects, StageZero collects and labels complex digital handwriting samples, including notes, drawings, and scientific formulas with a scalable approach. With a network of 110 million global contributors for annotation, the resulting datasets offered by StageZero are fully labeled, authentic and diverse. This helps develop the next stages of AI automation in the digital ink space and contributes to Wacom’s improvement of their AI algorithms for semantic content recognition. 

StageZero partners with Wacom logos

In 2023, StageZero and Wacom continue their collaboration. Dr. Markus Weber – Principal Ink Technologist at Wacom explains that “StageZero's flexibility and their professional setup in terms of data privacy made it easy for us to comply with our strict corporate policies regarding data privacy and cybersecurity. They understand our specific needs and are proactive in facilitating solutions." 

At StageZero we pride ourselves on our deep understanding of privacy and data protection.  We know companies can have multiple corporate policies they must adhere to, from their unique purchasing processes to specific legalities, and we’re proud to receive great feedback on our flexible approach to those. 

Read more: How to develop GDPR-compliant AI and How to ensure data compliance in AI development | StageZero checklist

Hear from our customers here. To find out more about how we could help with your specific use case, contact us

In November 2022, StageZero CEO and co-founder Dr. Thomas Forss was invited as a panel speaker at Technology Day - Connected Ink 2022, an event hosted by Wacom, a long-term customer of StageZero. Connected Ink 2022 gathered creatives and thinkers from different industries and cultures to share ideas and inspiration of humanity’s evolution and innovation. 

Together with Heidi Wang - Senior Vice President Ink Division and Dr. Markus Weber – Principal Ink Technologist at Wacom for the session “Understanding the meaning of ink”, Dr. Thomas Forss explained how StageZero’s solutions help Wacom’s projects and mission of making digital ink meaningful for artificial intelligence (AI) algorithms. You can see below an interesting recap of what has been discussed during the talk: 

StageZero’s unique solution solves Wacom’s data challenge  

Wacom provides cutting-edge digital ink solutions for a wide range of partners who use or produce smartphones, tablets, and digital stationery. By offering innovative products in the education field, Wacom’s digital ink solution helps teachers and students in their daily workflow. Especially, the solution provides teachers with insights that they normally would lack due to emerging remote teaching and learning. 

To enhance their digital ink solution, Wacom needed an AI partner who could use Wacom’s own data model for describing the contents of digital ink. Usually, in this case, enterprises have two options: the first option is to start building an in-house team of hundreds of people who deal with data manually; the second option is to work with a reliable partner that can handle data, so that they can put their focus on their core business.  

StageZero Technologies partnership with WACOM

Obviously, option one involves very dull tasks for many employees, high costs for the company to onboard said employees, and multiple risk factors regarding the ramp up time of the project and more. In Wacom’s case, this is when StageZero came in to help them to overcome this challenge. As a partner, StageZero assists Wacom by handling data annotation

For Wacom’s projects, StageZero collect and label complex digital handwriting samples, including notes, drawings, and scientific formulas with a scalable approach. With a network of over 10 million global contributors for annotation, the resulting datasets offered by StageZero are fully labeled, authentic and diverse. This helps develop the next stages of AI automation in the digital ink space and contributes to Wacom’s improvement of their AI algorithms for semantic content recognition. 

“Ink is definitely different from images or text”, Dr. Forss emphasizes, “On a technical level, for instance, an image is represented by many pixels and every pixel has its value, or characters or a group of characters come together to constitute a meaning. But, with digital ink, it’s a different dimension of text: there’s not only characters but every stroke that is drawn is taken into account. Even the sequence that a person draws this character to create a word or the pressure they use are recorded as well. In conclusion, there are so many dimensions to ink that differ from texts that make it a much more complex problem – and that’s the problem that StageZero is helping to solve.” 

StageZero’s unique solution involves a crowdsourcing method for which we partner with over 10 million contributors. This diverse group executes bite-sized AI training tasks in exchange for different types of rewards. Hence, instead of having hundreds of thousands of in-house laborers, StageZero provides you with a pool of global users who you can access 24/7 for a significantly quick turnaround time. 

StageZero specializes in two main fields: Conversational AI and Natural Language Processing (NLP), with a focus on data sourcing and data annotation. Within Conversational AI, StageZero mostly serves voice assistance, speech recognition, and chatbot. Within NLP, our strengths include text and handwriting, speech and audio, and digital ink. 

Read more: What to consider before starting an AI project in your company?

Future AI technologies for ink data roadblocks 

When asked to forecast emerging technologies in the future that also use AI to solve ink data challenges, Dr. Forss states: “In the text segment, there’s these large language models – for example, you might have heard of GPT3 and other types of underlying models that understand how people use languages either in text or speech. Something similar could be built within digital ink: you’ll train the algorithm first with a large set of hundreds of thousands of notes so it can understand how people write and use this. On top of that, in some cases, you’ll utilize other methods with smaller samples at some point to, for instance, recognize math or something else, and then use that to start predicting different types of labels. After that, you only need people to verify those labels, instead of having to do all things manually.” 

Dr. Thomas Forss CEO co-founder StageZero Technologies attending Connected Ink 2022 event by WACOM
StageZero CEO and co-founder Dr. Thomas Forss attending Technology Day - Connected Ink 2022 by Wacom

“Regarding data augmentation – meaning producing synthetic data on specific existing databases to train an AI model to produce handwriting, there has been some work done on this.", adds Dr. Weber. “You can use existing layouts to create new documents with just existing databases. For instance, the concept of ransom notes – the same thing can be done with ink, then produce more content based on an initial dataset which you have. 

Another trend is annotation-free training. An example of this would be using an artificial dataset to train your initial model, and then label some unseen content with pseudo labels. If the network is confident with this model, you’ll use them again to train your system so it can improve itself. The interesting part is that you’ll still need some ground truth data, like an initial labeled dataset for kick-starting.” 

Dr. Weber emphasizes that to execute these future technologies, the industry needs more contributors, especially more developers who are interested in exploring new domains such as ink. “That’s why we also try to open up our services to the community. We really want to jointly work on some services – for instance, sharing some initial datasets so you can license data from the beginning. This aims to tackle the barrier of starting out and to help enterprises that need specific targeted data collection, such as Wacom. There are many factors: education is different in different countries, students at a younger age may have different handwritings than the ones in higher grades… - different target groups have different handwritings, so we need really targeted datasets. That’s why we want to help the job get easier by providing the tool because there’s no good tool out there. That’s why we started. We had the universe ink model, we tried to standardize the content schemes where we define the results from handwriting recognition, sketch recognition, and so forth.” 

Read more: Real-world vs. synthetic data: Which to use in AI model training?

The future of ink annotation technologies 

Wacom and StageZero both have the vision of creating a developer community in the future – for Wacom, it will be used to build their own deep learning solutions fueled with ink content data. 

Answering the question “What’s your expectation of further scale-up of the adoption of ink annotation technologies that we are building together?” Dr. Forss stresses the need for more data, “…especially the need of getting more data from the community - different stakeholders joining, partners who help build databases of content that can be labeled and used. My ambition for the future is to even build open-source datasets, so stakeholders can participate without having to invest too much in the beginning. When they know what they want to do with them, they can start investing more and so they can just experiment freely at the start basically.” 

two people working on a math exercise on a WACOM tablet using ink technology

Dr. Markus Weber gave an example of an initiative generated by a computer graphics community – ImageNet which has millions of annotated image datasets that are available for research. “I can envision something similar for the future, maybe an 'InkNet', contributed by research communities. With such a big public dataset where people can provide their data collection and let those open to be available for other research, we’ll create such an engaging community and hence we can do much more and much better technologies for ink than has ever been seen.” 

Dr. Weber also highlights that in the long run, Wacom is going to need much more data and in research, a lot of initiatives start from communities.  

Collecting data from crowdsourcing 

Regarding the chance to obtain handwriting data from the crowdsourcing approach, Dr. Forss notes that this depends on the use case. Using this crowdsourcing approach, StageZero specializes in speech data. “The main issue [for obtaining handwriting with this approach] would probably be that the ‘crowd’ needs to have their tablets. Otherwise, it is easy to collect people’s handwriting when they write with their fingers on smartphone screens. In the future, I think we’ll come up with more innovations to solve that problem.” 

Link to watch the session: https://www.youtube.com/watch?v=RkuZpd2PmeQ&t=17072s

If this piques your curiosity, then please get in touch to discuss your project requirements with us. Otherwise, feel free to browse our off-the-shelf datasets here.

©2022 StageZero Technologies
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram