Bootcamp AI-Agents - Transforming Organizations with AI Agents
AI Agents in Production
On 13 March 2026 I had the opportunity to give a talk at the FHNW AI Agents Bootcamp on the topic of "AI Agents in Production". The central question was: what must an AI platform deliver so that agents do not just work as a proof of concept but scale reliably in everyday business operations?
The talk opened with a concrete use case: key people leave the company and their knowledge goes with them. An AI assistant answers around 80 percent of queries directly from manuals and the knowledge base. Where that is not sufficient, an AI agent steps in, forwards the question to an expert and stores the new knowledge permanently for everyone. This setup works well for a single agent. But as soon as a second or third agent is added, the so-called day-two problem emerges: a triplicated knowledge base, triplicated access control, triplicated testing, triplicated model integration. That does not scale.
The road traffic analogy captures it well: a single car drives fine without traffic lights. But a hundred cars need traffic rules, signals, surveillance and a roadworthiness check, not faster engines. From the third use case onward, a platform is needed to solve these cross-cutting concerns once centrally rather than replicating them for every use case.
Module 1: Knowledge Base - Connecting Company Knowledge
The first building block is the central knowledge base. The problem is straightforward: a service agent must answer from manuals, tickets and process documents, but the language model does not know this content. Retrieval-Augmented Generation (RAG) closes that gap. Documents are split into small sections and stored as vectors. An incoming question is also vectorised, and the system finds semantically similar sections to pass as context to the model. The result: the model answers with company knowledge it never saw during training.
Today each agent often maintains its own database, documents are processed twice over and results are inconsistent. A central knowledge base with clearly defined areas solves this once for all. New agents are immediately connected with no additional effort.
Module 2: Governance - Who Can Do What?
The second module addresses a question that is often asked too late in practice: under whose identity does an agent operate when it accesses customer data, process documents and technical manuals? There are two fundamentally different approaches. The agent as assistant inherits the permissions of the logged-in user, like an intern working under a colleague's login with the user's maximum access. The agent as an independent entity, by contrast, has its own identity and its own rights, precisely the minimal permissions it needs for its task.
Agents as independent entities make AI governable: they follow the least-privilege principle, can work without a user context (for example in overnight batch processing or automatic ticket responses), deliver consistent inputs for systematic testing and leave clear, attributable traces and logs.
Module 3: Transparency - What Happened?
Imagine a service agent gives a technician the wrong repair instructions. The customer is angry. What happens next? Without transparency the answer is: we have no idea how this could have happened. No audit trail, no traceability, and trust is lost. With transparency you can reconstruct what happened: which documents were consulted, how the context was assembled, what the model produced and which intermediate steps were taken in longer workflows.
I posed a simple question to participants: would you run a business process that nobody can trace? Transparency is not a nice-to-have. It is the prerequisite for responsible AI operations.
Module 4: Evaluation - Testing AI Like Software
Transparency logs are valuable not only for operations; they are the foundation for systematic quality assurance. AI quality is measurable, not a matter of luck. With a reference dataset of typical questions and verified answers, an LLM-as-judge approach (one language model evaluates the response of another) and custom evaluators for domain-specific criteria such as source correctness, compliance and tone, quality can be measured automatically. After every change, whether a new prompt, new documents or a new model, evaluation runs and immediately shows whether quality has risen or fallen.
One important point: agents with their own identity deliver consistent inputs and are what make systematic evaluation possible in the first place. The modules build on one another.
Module 5: LLM Gateway - Model Changes Without Chaos
Models are changed more often than you might think: because a provider deprecates a version, because a newer model offers better quality, because costs have dropped significantly or for legal reasons. Without a gateway, every agent is hard-wired to a specific model. As the number of agents grows, a necessary model switch quickly turns into chaos, especially when speed matters.
The LLM gateway decouples agents from specific models. Use cases address logical endpoints such as "thinking-large", "fast-small" or "embedding", and the gateway maps these to the currently active concrete models. A service agent, an onboarding agent and a compliance agent all communicate with the gateway, which routes requests to Claude Opus, GPT-4o mini or the embedding model depending on requirements. Cost control through central rate limits and budget caps is a further advantage of this architecture.
Module 6: PII Protection - Protecting Sensitive Data
The sixth module addresses a central concern in the Swiss enterprise context: personal data must not leave the organisation uncontrolled. A service agent handling customer queries deals with names, email addresses, customer numbers and sometimes credit card data. Depending on the provider, requests may be stored or used for training purposes; GDPR and the Swiss Data Protection Act require the protection of personal data; and even without storage, the question remains whether sensitive data should be transmitted over the internet at all.
The solution: PII protection at gateway level, automatic, central and without any code changes in the individual agents. Before a request leaves the organisation, sensitive data is detected and handled: names are masked with placeholders, email addresses and IBANs are replaced, and credit card data blocks the request entirely. Solved once centrally, the same pattern applies as with all other modules.
How the Modules Work Together
The six modules only realise their full value in combination. A complete request through the platform looks like this: the service technician asks a question. RAG searches the central knowledge base. Governance checks whether this agent is permitted to access the relevant data. Transparency logs which documents were consulted and which prompt was assembled. PII protection masks sensitive data in the request. The LLM gateway routes to the right model. And evaluation regularly checks whether quality is still on target.
Each module is built once and then works for all agents. That is the decisive principle: solve it once centrally, rather than replicating it for every use case.
Synthesis: The Platform Decides Whether AI Scales
The closing question of the talk brings everything together: do we solve this problem once centrally, or do we repeat it for every use case? A PoC is quick to build. But the platform determines whether AI actually scales in the enterprise.
The discussion prompts at the end sparked lively conversations: which module would you build first? How would you describe your current AI agents, as assistants or as independent entities? And what happens in your organisation today when an AI system makes a mistake? These questions struck a nerve with participants and showed that the path from successful proof of concepts to production-ready AI still lies ahead for many organisations.
