In March, StageZero presented our first ever webinar to have the discussion on our report “AI Adoption in Europe 2022: How high performers generate value”, the first-ever large-scale survey on AI implementation in Europe.
Hosted by StageZero CEO and co-founder Dr. Thomas Forss, and Business Development Director Lesley Kiernan, we were proud to host two keynote speakers: Dr. Magnus Westerlund - Principal Lecturer in Information Technology and Director of the Laboratory for Trustworthy AI at Arcada University of Applied Sciences, and Dr. Markus Weber - Senior Ink Technologist at Wacom.
Dr. Thomas Forss: We were looking around for information for a long time, especially for Europe, that’s our main market. We realized that there’s not many AI data options here. We started looking into how we can do it ourselves.
We created a survey and we reached out to managers and experts within the field. We don’t really see any limitation or bias in this, but for next year, we’ll try to get more respondents.
Thomas: The respondents were technologists and leaders, who have had at least 3 years of experience in the field of machine learning and AI, for example lead or senior data scientists, heads of AI, and chief information officer. We used LinkedIn to reach out to our respondents.
Thomas: Europe is interesting because it is different from other regions. Europe is often compared to US, a country that has an advantage which is the fact that it is a big market with two main languages: English and Spanish. Europe, on the other hand, has many small countries with ‘smaller’ languages. In fact, I think in Europe we have more than 25 languages.
Thomas: In chart 6, we see the types of value reported in AI implementation by companies, evaluated on a scale from 1 to 5. One of the many interesting things here is that a lot of research or forecasts tell us that most of the AI implementation attempts fail. However, our report shows that companies get value out of them. The average value that companies reported here is quite high, for all categories of types of value. For me, this is quite a pleasant surprise.
Thomas: 90% of respondents had some issues with having a sufficient amount of data to train algorithms. This high percentage also surprised me. The results show us that not only small companies have this issue, but also across the board. Lack of data is a clear problem.
Thomas: This is something that I experienced myself before. Usually in companies, there are a lot of new ideas, but quite often, we are limited to the data that is available. The results here perfectly match my experience.
Thomas: For this research, we especially tried to find out what high-performing companies focus on that others do not. We found out that at least, when it comes to organization, they are focusing on certain things, including hybrid and centralized style. This concerns how they make decisions and how they spend their budgets.
Thomas: For this very interesting chart, we asked respondents a few different questions, such as whether they have all the data they need – most do not, and how they acquire their training data. Here you can see that 75% of them get their data from their customers. The next popular answer is that they create training data themselves somehow. But also, over half of these high performers also partner with data providers who collect real-world data. This factor of data acquirement distinguishes the high performers from the ones who are not so much ahead of the competition.
Thomas: Firstly, AI development is very complicated. It seems that companies tend to focus on their core, meaning building the algorithms themselves, and partner with a third party such as a data provider, or using different types of platforms, for example data platforms or MLOps platforms. Secondly, companies who have a clear AI strategy and a dedicated budget for ML/AI research and development have a higher chance of success or at least more value out of implementations. All the high performers have come quite far with MLOps.
In our next survey, StageZero plan to extend our research to Asia. Like Europe, this continent has many different countries and languages. I wonder if we’ll see the same results as Europe, but maybe not. We also plan to get more respondents for even more specific segments and more defined results, discuss issues on a more granular level.
Dr. Markus Weber: Yes, obviously we are expanding the budget for AI, especially for cybersecurity. This can be a sensitive topic as cybersecurity threats can cause financial harm to companies. There is increasing investment in cybersecurity, data collection and annotation from our side. But of course, the global crisis is affecting the overall budget and management requires clear understanding of why the expanding budget is needed.
AI technologies help solve problems that could not be solved previously. At Wacom, we are looking at pen and ink usage, especially for the education market, and trying to solve the problems that this market is having to face. We then need to identify the AI technologies that can solve these problems. When you improve the feasibility and the user experience, you simultaneously improve your product. For example, at Wacom we aim to enhance the value of pen and ink by improving the flow and our pen technologies using AI technology. It’s about improving the end-to-end experience and seeing how AI can contribute.
Dr. Magnus Westerlund: The direction of research funding is heading to an increase in budget for development within areas of healthcare, humanity, and social sciences. I find it important that we are not only focusing on mere technologies but also on use cases like diagnoses and disease detection. There’s a lot of financing currently available for research in this area – the biggest change I’ve seen in financing is within this area.
Thomas: Looking at the leads we’ve been receiving at StageZero, I think that companies would be at more risk if they don’t invest in AI, rather than if they do.
Markus: Yes. The challenge when it comes to data collection, the best case would be acquiring representative data from the users you want to support. For this, we need a legal framework, including user consent with full transparency of what you are trying to get from the data, we need to gain trust from the users, so they are willing to share their data. After this, you need a standardized approach to annotate the data cost-effectively. It’s not just about having enough data, but also meaningful and high-quality data to serve your training. High-quality data annotation is also a challenge, especially when dealing with complex data like our ink data.
Magnus: I fully agree with Markus’ assessment. We had a collaboration with a company, and it turned out that in order to annotate data, we would have to bring in their senior expert for several months and it’s not an efficient way to do machine learning. I think in the research field, we should rethink data annotation and brainstorm on how to create new datasets. Having said this, there are new directions coming and one of them is supervised-learning ML method which is interesting. Recently with large language models, we suddenly have an extreme amount of data, as long as the ML model can train itself then annotation is not needed for every sample. In most cases, we have not had this possibility. In general, the way that we deal with annotation is not efficiently developed and that’s what we should work on.
Markus: Actually there was a running joke in my team because we could never get enough data, so at some point the team started to look at synthetic data. We had small sets on which we used a “ransom note” method, so using small parts of different sets and putting them together like ransom notes. It was really hard to get hold of the right data. It was painful. At one point we tried to collect our own data ourselves, but we were desperate, so we looked at synthetic data. Mixing handwriting from different users doesn’t give the same quality of data (think of pen pressure or hand angle) is painful especially with handwriting, so we started to look at synthetic data.
Magnus: To some degree I think there has been a long history of using synthetic data, for example when it comes to network data, network traffic. We’ve been simulating data for a long time in this area. When we look more at sensor data, with sensors that work in a repeatable fashion we can then define the changes in data through statistical means. But then what we have not seen yet… Quite many companies are thinking of how to go forward with this, particularly regarding how to be mindful of privacy issues. What we have not seen yet is multimodal data, meaning we have many different types of sensors describing the same phenomena, here I have not seen many examples of synthetic data where you’d be able to use it properly. For example, the case of Tesla raised the question whether synthetic data is needed for cars.
Thomas: Having worked with fields including NLP and computer vision, I would say that all of these areas use some form of synthetic data, the question is about which one uses it more, and speech and audio data seem to be using the least. My guess is that this is due to a lack of technological readiness for creating good enough data that can be used as synthetic data. I was in touch with companies who enhance or change synthetic data to improve their algorithms, similar to Markus’s example of Wacom’s ransom notes. I have never heard of training synthetic data with voice only.
Magnus: First of all, perhaps maybe because personal data is very specific. Most companies have no need to link personal information to a specific person. So in that sense I believed the mature companies would know how to do this, and would already manage data privacy and it should not be a major issue. But of course, perhaps they also consider other types of privacy regulations, for example the proposed AI Act that is upcoming.
Thomas: The hope would be that the world would have one joint legislation, although in the short term it may not be possible. For example, in the US, right now there are different regulations in each state. There are about one hundred regulations globally. Companies do not have the resources to make sure that they follow all of those.
Markus: Yes, it is painful. To handle this, our internal legal team works with a local partner. There are local lawyer firms who specialize in data legislation for different regions. This goes back to the budget issue because lawyer partners can be very costly.
Thomas: They should be one of the driving factors of legislation and how we use AI. Trustworthy AI, GDPR and AI Act are applicable to concrete scenarios because I think there’s a risk if only lawyers from the EU draft these.
Magnus: I fully agree with this, and this is what we’ve been working on intensely. Over the last year we’ve been doing assessments of high-profile cases, We launched an initiative that includes industry experts and academia. We tried to figure out a solution and see whether it’s trustworthy, and what trustworthiness actually means. This is a core thing. The second thing is it’s necessary to educate people, not only engineers but also businesspeople need to truly understand what AI really means.
Markus: We work a lot with research. I work with my former research institute on a lot of projects, and we have Master’s students doing their theses at Wacom. We also promote knowledge transfer from external research partners and experts of different areas within the company. It is important for knowledge to be understood by everyone in the company, even for management and shareholders.
StageZero are AI data specialists, based in Helsinki, Finland. We seek to lead the conversation on valuable and ethical AI implementation in Europe, and to highlight the true value of AI within the European context.
Did you get the chance to fill out our survey on the State of AI in Europe 2023? We'd love to hear how companies like you are handling your AI implementation.