In 2022, StageZero conducted the first large-scale study into the adoption of artificial intelligence (AI) across the European continent. We conducted questionnaires with technologists such as data science and machine learning (ML) leads, engineers, and decision-makers across multiple industries and business functions to find out how they and their companies are implementing and benefiting from AI. The results were published in a comprehensive report free for download here.
The survey sought to identify what the European market leading enterprises did differently to stay ahead of the competitive trends when it comes to AI implementation, and one factor stood out significantly – they partner up with external companies for data acquisition. Here we investigate how they’re handling their data acquisition differently and why.
Our report surveyed companies of all sizes and backgrounds across the European continent to gain insights into what is really happening “on the ground” when it comes to the European AI ecosystem. In our report we classified the respondents in relation to their self-assessed competitivity regardless of size, revenue, etc. The respondents were asked to rate their company’s performance against their perceptions of their competitors’ performance when it comes to adopting AI.
43% of those we questioned stated that they believe they’re ahead of the competitors. 32% estimated that they’re likely about the same as their competitors when it comes to AI implementation, and 24% stated that they perceive themselves to be lagging behind the others. In the report, the top performers were the 43% which are considered to be leading the way in AI adoption. We’ll take those companies as the leaders and explore what they’re doing differently compared to the others when it comes to obtaining their ML training data.
When it comes to implementing AI to their projects, an overall scarcity of data was cited as a main challenge for the majority of companies we surveyed. 62% said that they would be able to develop more solutions than now, if they had access to the right training data, indicating that the lack of availability of data is a blocker. 67% reported that a lack of diversity in the data available is preventing them from seeing the true value of their ML projects. Therefore, we were curious to see what the high performers are doing differently to obtain data in high enough quantities and quality.
Across the board, different companies are acquiring their data via various means. Most (75%) of the high performers are using internal data that they collected from their customers. 69% of the high performers create datasets themselves, while 56% reported partnering with third parties to obtain the relevant data. Seeing over half of the high performers partnering with third parties to obtain training data was understandable since third parties enhance the scope of collection, increasing the diversity of the data provided and therefore reducing bias in the resulting algorithm.
Using synthetic data is not a widespread practice among the high performing companies. Only 6% of high performers reported collaborating with partners to get synthetic data (among other sources), and 25% reported that they’re creating their own synthetic datasets, implying that real data is the preferred option for the high performers.
Synthetic data might be used in cases where there is a distinct lack of real data, or the data can’t be used due to regulatory compliance. For example, some data contain personal information which are forbidden under the GDPR. Since we interviewed European companies, this is a very likely scenario.
However, synthetic data has its drawbacks. Typically synthetic data misses some of the outliers that might be present in real datasets, since it can only mimic, not replicate real data. Sometimes the outliers are important to the reliability of the algorithm. Synthetic data is often praised for avoiding bias, but if the source data contains bias, it’s likely that the synthetic data will too – the biases in the original dataset can be expected to show up in the synthetic data in such cases. There is also a risk that in an attempt to produce a fair synthetic dataset, manipulated data would be less accurate.
“Synthetic data represents a great solution for outlying cases where the real-life data might be lacking or sensitive. But synthetic data has its limitations. Depending on the use case it can be less accurate, and generally speaking it it works best when used to fill gaps in the real-life datasets. Then companies should consider, can they make the synthetic data look close enough to the real data for it to be useful? You won’t know until you try, and that’s time-consuming.”, explains Thomas Forss - CEO and founder at StageZero Technologies.
Read more: Real-world vs. synthetic data: Which to use in AI model training? and StageZero's guide and checklist to privacy and AI
The high performer group was more willing to rely on partnerships for solving issues with their data such as sourcing ML training data. The vast majority was clear on this topic, with 81% of the high performers saying that they would “definitely” or “probably” work with third parties to solve possible data issues. This demonstrates a clear trend in the workflows since the other respondents were also eager to work with third parties, just to a smaller extent (at 67%).
The clear conclusion here is that the high performers are more open to collaborations with third parties for assistance on handling issues with data. It could also provide reason to speculate that the enterprises who are more flexible on processes such as partnering with third parties have a higher success rate in their AI implementations.
“I’m not very surprised to see that the market leaders are choosing to partner with third party experts. Our customers report solid benefits to partnering with us and the trend in the market is clear.”, explains Lesley Kiernan - commercial director at StageZero Technologies. “As the market matures, the benefits will become even clearer, so I expect to see more companies choosing to partner with experts”.
Indeed there are multiple benefits to partnering with an expert. When it comes to data acquisition, we can assume that high performers have evaluated different processes and have landed on partnering with a third party as a reliable choice. Partnering with a third party for data acquisition can indeed provide numerous benefits, and here we explore some of those.
The most obvious benefit of partnering with an expert third party is the opportunity to extend the company’s expertise. The partner will provide the company with access to more specialized knowledge or more specific expertise that they might be lacking in-house, usually at a more cost-effective plan. In turn, the company can improve the quality of the services it provides and even offer new services or products. For instance, a company looking to expand the market share of its application in Western Europe would benefit from localizing the experience to speak local languages. StageZero provides training data for exactly such use cases, allowing customers to extend their customer reach to new languages and markets.
Partnering with a competent third party allows the enterprise to streamline its operations. Effectively the customer can outsource the more complicated (and arguably less interesting!) aspects of the machine learning projects such as data acquisition and verification, leaving them with more time in house to focus on their core competencies. Our customers estimate that data acquisition and verification represent an average of 80% of the project time. When they outsource that to StageZero, they can use that time for other tasks and enhance their competitivity.
Generally a company’s overall flexibility increases thanks to partnering with an external expert. Partnerships provide the opportunity for very fast scale-up (or down) depending on fluctuating market conditions, enhancing the enterprise’s reactivity to the economic climate. Today this is more important than ever. A company might not have the need or budget for a full-time data team, and outsourcing such tasks to industry experts provides a flexible and more cost-effective solution.
Partnering also provides the enterprise with the opportunity to share risk, effectively reducing it. By sharing the responsibilities and risks with an external party, the likelihood of project success increases. StageZero are proud to help companies outside of the European Union to adhere to data privacy and security compliance such as the GDPR. As one of the only fully GDPR-compliant data service providers in Europe, StageZero is in a unique position to cover enterprises on their data security.
Another reason we imagine leading companies to prefer partnering, is that the enterprise’s competitiveness increases dramatically when they partner with third party experts. Partnering with another company enhances competitivity by allowing them to offer new products or services that they might not previously have been able to bring to the market. StageZero specializes in helping companies scale their products and services to new languages and markets, helping companies to extend their market share and maintain competitive customer retention rates.
Overall, the leader group demonstrated their understanding that partnering with a third-party expert like StageZero to obtain training data services can help them to improve their operations, reduce risk, enhance their competitivity and be more likely to succeed in the long run.
Read more: How to develop GDPR-compliant AI and How to ensure data compliance in AI development | StageZero checklist
Psst! Did you miss our webinar discussion on AI adoption in Europe? Catch up with our expert panel discussion here to find out what else the leading enterprises are doing differently to keep ahead of the trends.