AI and data protection - What do I have to consider?

Technologies based on artificial intelligence (AI) and automated document processing have become an integral part of the modern business world. Combined, they enable automated workflows that could only be dreamed of a short time ago. However, like any other technology, AI poses potential risks and dangers. Concerns are increasingly being voiced, particularly with regard to data protection. In order to counter this and to be able to use AI without worries, we will take a closer look at data protection with regard to AI in this blog article.

How does an AI work?

AI systems depend on data to a very high degree. These are needed by the algorithms to learn, make decisions and gain additional insights. It is important to know that the processed data is not simply collected in a large database, which would indeed be a data protection inferno within a very short time. Instead, the data is analyzed statistically and the AI learns from the combination of data: Order, frequency, position in the document and other factors are analyzed and evaluated.
In intelligent document processing, this means that by processing new documents, models learn to better classify them in the future and extract data from them. Such data can contain personal information such as names, addresses or financial data of customers, but also confidential information between companies in a B2B context or simply other texts.

AI and data protection in general

If an AI is now trained or improved during operation through continuous learning, it is highly likely that personal data will also be processed. Even the naming of a contact person on invoices, the personnel number on travel expense reports, or the license plate number on a fuel receipt are personal data. If this personal data is processed in the EU or EEA, the General Data Protection Regulation (GDPR) applies. It also applies if the data is processed in a so-called third country (i.e. outside the EU/EEA), but this is provided as a service to EU citizens or the service recipients are simply located in the EU or EEA.

The last case in particular (processing in a third country) is challenging because EU law is difficult to enforce in this third country. Therefore, the GDPR prescribes special contracts for such cases, the wording of which is specified by the EU Commission and may not be changed, the so-called Standard Contractual Clauses (SCC).

Back to AI: If personal data is used for training an AI or simply processed by an AI, companies in the EU must comply with the GDPR and also select their service providers to do the same.

This is one of the reasons why, for example, the AI chatbot ChatGPT is currently the subject of so much discussion in the media and among experts.

Without going into too much detail: The operator of ChatGPT, the company OpenAI is located in the USA and thus in a third country. In 2020, the European Court of Justice (ECJ) invalidated an agreement between the EU and the USA that had regulated the exchange of personal data between these two jurisdictions. Since then, the U.S. has to be treated as a “normal” third country and thus the hurdles for personal data processing operations have increased enormously. One of the reasons for this decision of the ECJ was that the access possibilities for US intelligence services were (and currently still are) almost uncontrollable. At the same time, there were no possibilities for data subjects to exercise the rights resulting from the GDPR or any kind of legal protection with regard to their data.

AI training and the legal basis

But even if the processing takes place in the EU, a legal basis is required for training an AI using personal data. If the users are also the clients, this can be regulated by contract. However, if the client is the employer of the users, we have a classic case of commissioned processing, provided that personal data are included in the processed data. In most cases, this cannot be ruled out, and in the case of document processing, it must even be assumed that personal data are being processed. A contract for commissioned processing in accordance with Art. 28 GDPR is therefore required.

Now, a processor may not simply use the data provided to it for its own purposes. However, this is exactly what the training of an AI represents: An own processing purpose of the operator of the AI.

Data protection at natif.ai

At natif.ai, all data processing takes place in the EU and is subject to the high standards of the GDPR. No data is transferred to unsafe jurisdictions, not even to the USA. The entire technology used at natif.ai is not only located in the EU, but also in Germany.
It goes without saying that natif.ai concludes a data processing agreement with its customers, as required by the GDPR.

Processing with existing AI models
If existing models are processed, natif.ai keeps the document together with all processed parameters for 14 days.
After 14 days, all data is deleted. However, the data can also be deleted at any time via the API (all information can be found here).
If the documents are useful for improving the existing models used by the customer, training data is stored for the duration of the contract in order to improve the model accordingly. If the contract is terminated, the data will also be deleted.

Processing with self-trained models
In the case of self-trained models with your own documents, natif.ai retains the data for as long as the contractual relationship exists or the customer deletes their individual model – this serves to ensure the proper use of the model. As soon as the customer account or the annotation project (which forms the basis of the model) is deleted, all associated data is also automatically deleted.

All advantages at a glance

Advantage no. 1:
All processing takes place in the EU. It is therefore subject to the high standards of the GDPR. No data is transferred to unsafe jurisdictions, not even to the USA.

Advantage no. 2:
Of course, natif.ai concludes a contract processing agreement with its customers, as required by the GDPR.

Advantage no. 3:
The customer has data sovereignty. Customer data is only used for training during the period in which the customer relationship exists. The productive data is stored for a maximum of 14 days. The customer can arrange for the data to be deleted at any time via the API.

Thus, natif.ai meets the high requirements of the GDPR and offers its customers a service using state-of-the-art AI.

This AI learns on a customer-specific basis so that all the advantages of a learning AI can be used. At the same time, the use of the data for the continuous learning of the AI is unproblematic, as the data is used entirely on behalf of the customer.

This article was written jointly with our data protection officer.