Aug 23

How to develop GDPR-compliant AI

Artificial intelligence development keeps growing in popularity and sophistication, requiring more and more data, often sensitive data, to function and predict. This brings AI straight under the scope of the General Data Protection Regulation (GDPR). Since its implementation, the European privacy and security law has reshaped how companies handle data. Be it concerning cookie policy or AI development, companies need to follow the principles of the GDPR; otherwise, they risk hefty fines.

Why GDPR applies to AI development

The GDPR was created and enforced by the European Union (EU) to protect personal data privacy. Provided they target or collect data related to individuals in the EU, organizations worldwide are obligated to comply with the GDPR.

Each EU member state supervises GDPR compliance through independent data protection authorities. The GDPR came into effect in 2018, and fines up to €746 million were already handed to companies in multiple industries. For example, Clearview AI paid a €20 million fine for failing to process personal biometric and geolocation data lawfully. The company also breached several fundamental principles of the GDPR, such as transparency, purpose limitation, and storage limitation (more on the principles later).

So while the GDPR does not explicitly mention AI, companies developing AI models must comply with the regulation: AI development uses data, and GDPR protects that data.

How GDPR impacts AI development

Any AI development data collected within the EU falls under the protection of the GDPR, regardless of whether your company is based in or outside of the EU. Regulatory compliance is mandatory as long as it is related to EU residents.

What is important to note, personal data collected before the GDPR lacks such legal compliance and cannot be used. The legal ground for collection must be presented to the person the data is collected from. In other words, the person needs to know for what purpose you are collecting the data and give their legal consent in the form of a contract or other means.

The GDPR is based on principles that must be followed when developing technologies involving data. Most of these principles are also highly relevant to AI development. The principles provide data protection guidelines for organizations looking to develop AI models and collect necessary training data.

Purpose limitation

Purpose limitation represents limiting the repurposing of personal data. For example, data collected for a customer service use case can be reused to send tailored marketing messages, but this has to be clearly indicated. Establishing whether the data reuse is legitimate will depend on whether a new purpose is compatible or incompatible with the original intent to collect data.

To avoid breaching the principle of purpose limitation, specify the purpose of data collection and usage in your data privacy policy. This should cover all possible use cases upon collection.

Data minimization

At first glance, data minimization might not seem compatible with AI, as its models need lots of data to learn. But data minimization simply encourages a judicious approach to data collection, and it is about focusing on the quality of datasets rather than large volumes of data.

According to the GDPR, companies should not process any more personal data than is needed to reach their goals. To comply with the principle of data minimization, in your privacy policy indicate what PII data was collected (if any) and specify how long data will be stored and under what conditions.

Automated decision-making

As the term suggests, automated decision-making is done without human involvement. Automated decisions can be based on factual, digitally created, or inferred training data.

Concerning AI, automated decision-making is the most commonly discussed GDPR principle. That is because AI implies automation by nature, and under the GDPR, individuals have the right not to be subjected to a decision solely based on automated processing.

When collecting data, companies are obligated to provide information about the logic, significance, and consequences of the decision to the persons whose data is being collected. This information has to be presented in clear and plain language.

Automated decision-making is closely related to profiling, which brings us to the next principle.

Profiling

Improving technological AI capabilities increased the opportunities for profiling dramatically. This principle refers to an automated data processing to assess the data subject's personal aspects. Such aspects can be utilized, for example, to analyze a person’s economic situation or work performance. The data is then used to develop comprehensive user profiles, for example, for creating tailored advertising.

Some uses of profiling can lead to unwanted results. For example, Amazon had to scrap its AI recruitment tool after it proved to show bias against women.

According to the GDPR, it is prohibited to subject someone to any decision based solely on automated processing, where the decision has legal or another severe effect. Profiling can never be executed based on race, religion, or health data unless explicit consent is given or it is in the public’s interest.

Fairness

Following the fairness principle, companies must not process data in an undisclosed or ill-intentioned way. AI models must not use data to generate adverse outcomes for those whose data was processed.

While it is rare for a company to knowingly employ unfair AI data practices, unfair data use can often occur unintentionally. For example, Twitter’s image cropping algorithm was deemed racist after users noticed that the feature automatically focused on white faces over black ones. Biased data breaches the principle of fairness and produces flawed AI models. However, you can avoid it in your AI development by using diverse training datasets.

Transparency

According to the transparency principle, individuals must be fully aware that an AI system will process their data. Subjects must be well informed on data processing purposes and understand how the applied AI algorithm has come to a decision concerning them.

To ensure transparency when developing your AI model, you should provide the information above in your company’s privacy policy or elsewhere on the website.

Accountability

Whether because of lacking setup or biased data, AI-based decisions always have the potential for negative outcomes. Proactive measures can vastly minimize these outcomes. Following the accountability principle of the GDPR, companies must implement strategies and procedures to predict or mitigate data privacy risks.

Create a Data Protection Impact Assessment (DPIA) to show relevant authorities that you have thought about risks and mitigation. The policy will help reduce the probability of getting a fine if something goes wrong.

How to ensure GDPR compliance when developing your AI model

If you follow the GDPR principles listed above, you are already on the right track to ensuring compliance. Additionally, there are a few specific methods you can apply in your AI development:

Reduce the need for training data by applying methods such as federated learning.
Uphold data protection without reducing the primary dataset with differential privacy or homomorphic encryption.
Avoid the “black box” issue with methods such as explainable AI (XAI).

But by far, the best way to avoid paying a sizable fine is to focus on the data. Use diverse, carefully selected datasets to prevent bias. For example, remove racial components in use cases where that could cause biased predictions. And whenever possible, train AI models on anonymized data.

Whether using or collecting your own data or collaborating with a data vendor, ensure all privacy-related regulations, including the GDPR, are followed through.

Need help? Reach out to StageZero if you want to know more about data collection and necessary regulatory compliance.

Share on: