9th Datenschutztag AI & Data Protection

Data Flows in AI Systems

15.3.202440-50 participants

On 15 March 2024 I had the privilege of giving a talk on data flows in AI systems at the 9th Datenschutztag AI & Data Protection organised by HÄRTING Rechtsanwälte AG. The event was aimed primarily at professionals from the fields of data protection and information security who face the growing challenges around the use of artificial intelligence in their organisations. In my talk I shed light on fundamental questions that arise in connection with generative AI systems, particularly around the capture, processing, and sharing of data. With my presentation I aimed to give a better understanding of the complex data movements in AI systems and to show how companies can manage these processes safely and efficiently. My intention was to lay the groundwork for a well-informed discussion, especially in combination with the other more legally oriented talks of the day. At the same time, global legislative initiatives, such as the EU AI Act, were a central theme of the event, defining the prerequisites for safe, transparent, and responsible AI use.

Generative AI and the Role of LLMs

In the first section of my presentation I explained how Large Language Models (LLMs) work: they learn from enormous datasets using statistical methods, recognising patterns in order to independently generate text, images, or other content. I emphasised that this so-called "generative" potential forms the basis for many innovative applications but also creates new risks. Data leaks can occur when sensitive information in training or prompt data is inadequately protected. This is a core challenge: with every new prompt and every interaction, confidential knowledge can be inadvertently disclosed. I also stressed that this risk does not originate directly from LLM technology itself but is a general problem of data movements in which third-party services are used.

Context and Prompt Structure

A further focus was on how important it is to "feed" the AI with the right information. Prompting refers to steering LLMs through targeted input that must be embedded in a specific context. Using practical examples, I showed how context plays a decisive role in the quality and relevance of AI responses. The Retrieval Augmented Generation (RAG) method illustrates this particularly well: external data sources are integrated and searched in a structured way using vector databases to provide suitable input for the prompt. The more precisely this context is prepared and the more skilfully prompts are formulated, the more accurately and safely the AI system can act. However, companies must ensure that no sensitive information is transmitted uncontrolled to third-party services in this process.

Challenges in Handling Data

In the third section I explained the hurdles companies frequently face when deploying AI systems. These include above all the limited context lengths of some generative models, which make intelligent management and prioritisation of the data provided necessary. Thorough validation of AI outputs is also indispensable, since LLMs are excellent at recognising and replicating patterns but do not have genuine factual knowledge. This can result in so-called "hallucinations": invented or incomplete information. The situation becomes even more sensitive when companies rely on confidential data that must be protected or anonymised. Using concrete case studies I illustrated how data sources can be designed to guarantee the highest possible level of security. Anonymisation and pseudonymisation concepts are indispensable here to comply with legal data protection requirements while still benefiting from the enormous potential of AI.

Discussion and Outlook

The subsequent discussion round made clear that the legal framework, not only in Europe but also in the USA and other regions, strongly influences how companies deploy AI technologies. Data Protection Officers (DPO) and Chief Information Security Officers (CISO) play a key role in risk analysis, supplier management, and the protection of critical infrastructure. Many audience members shared their experiences with AI solutions and brought valuable questions ranging from technical details about vector databases to ethical considerations on the use of AI.

To conclude: the future of artificial intelligence depends not only on technical progress but significantly on responsible data management. Anyone who wants to use AI efficiently and safely must engage deeply with data flows in AI systems and develop an awareness of possible risks. At the same time the technology holds enormous potential for innovation and value creation, provided all stakeholders, from developers through data protection experts to executives, work closely together and have clear guidelines at hand. These insights are what participants take away from the 9th Datenschutztag AI & Data Protection, and I hope they will implement them in their own organisations to fully harness the opportunities of AI while preserving data protection.

SWICO - AI in Action bbv KI Webinar - Data Flows

bbv KI Webinar - Data Flows SWICO - AI in Action