Logo Marius Högger

bbv KI Webinar - Robust AI Foundation

Webinar: Robust AI Foundation

20.11.2025ca. 30 participants

As part of the bbv webinar series on artificial intelligence, this episode from 20 November 2025 focuses on AI infrastructure as the foundation for modern applications. The session is moderated by Emre; I contribute practical insights into which building blocks organisations need for stable and scalable AI operations. The focus is less on individual AI use cases and more on the reusable platform components that enable model changes, cost control, data integration, quality assurance and traceability. The goal is a sober assessment of how AI infrastructure, as an organisational and technical foundation, supports long-term viable AI solutions.

Loading YT...

AI Infrastructure as the Foundation for Scalable Applications

When the term "AI infrastructure" comes up in projects, many people first think of GPU clusters, compute power and the cloud. In practice, however, the bottleneck often lies elsewhere: AI only becomes sustainably usable when there is a cross-use-case foundation that solves recurring requirements centrally, regardless of whether the end result is a chatbot, an assistance system or a background agent.

"Today we are talking about the hob, the oven and that kind of thing."

Infrastructure here does not mean the end product but the base on which multiple AI applications can reliably be built. In the blueprint, AI infrastructure sits at the bottom as the foundation, with various use cases built on top and sharing that base.

Making Models Centrally Interchangeable

A frequent requirement from organisations: models must be exchangeable quickly and centrally. The reasons are vendor lock-in, different model sizes (cost vs. quality), data protection requirements (for example location/jurisdiction) and the rapid development of new model generations.

"New models come along at a monthly pace. You naturally want to enable a fast switch."

Technically, switching within comparable model types is often feasible, but behaviour can change. An infrastructure that treats model changes as a controlled operational process is therefore required.

LM Proxies and Gateways

This is where LLM proxies and LLM gateways come in: use cases do not address specific provider models directly but stable internal endpoints such as "thinking-large" or "embedding-small". In the gateway the mapping to the model behind that endpoint is maintained. This allows central switching without adapting code in every use case, an adapter layer between application and model.

Keeping Costs Plannable and Monitorable

With "LLM as a service", costs are usually token-based and therefore dynamic: depending on request volume, text lengths, tool calls and agent steps. Organisations therefore want to set budgets, see costs per team/key/model and prevent misuse (for example with externally accessible bots).

Because requests run through the gateway, it can measure, break down and limit token consumption and costs (limits, rate limits, alerts). Centralised access management is often attached to this as well: API keys, roles, user groups and policies.

Integrating Company Knowledge via RAG

Language models know public training knowledge but not an organisation's current internal knowledge. To make that knowledge usable, Retrieval Augmented Generation (RAG) is typically used: documents are split into sections, semantically indexed and, when a query arrives, relevant text passages are found and passed to the model in context.

This requires above all:

  • Embedding models for vectorisation
  • Vector databases for semantic search

Ingestion: Keeping Data Current

RAG only works when data is current and controlled. Documents change, must be deleted (compliance) or outdated versions should no longer appear. An ingestion pipeline therefore belongs to the infrastructure: it monitors data sources (for example SharePoint, Confluence, file shares), detects changes and updates the vector database.

A practical point: poor version management ("v1/v2/v3" in parallel) leads to duplicates and wastes context. It is better to archive old versions and index only current sources.

Assuring Quality: Evaluation Rather Than Flying Blind

When models can be exchanged centrally and systems change, evaluation becomes central: how do you detect whether a model change, prompt update or new data pipeline has improved or degraded quality?

A typical evaluation setup includes test questions, reference answers (and for RAG optionally expected sources) and metrics such as correctness, completeness and conciseness. LLM-as-a-judge is frequently used to generate evaluations automatically at scale. Ideally, evaluation happens before a switch (comparing old vs. new) and continuously during operations.

Traceability Through Observability

AI systems consist of multiple processing steps: context assembly, query rewriting, retrieval, guardrails, tool calls, post-processing. When something goes wrong, the decisive question is which step influenced the result.

"Every step should be traceable."

For this, logs and traces are used (for example OpenTelemetry, OpenInference) along with suitable observability tools that visualise requests as a flow. This is relevant both for debugging and for operations and governance.

Data Protection: PII Detection as the Last Line of Defence

With external model services, the risk arises that requests are at least temporarily stored by the provider or processed outside the desired jurisdiction. A PII detection step is therefore often inserted as the last step before the model: personal data (name, email, IBAN etc.) is replaced or the request is blocked. Implemented centrally in the gateway, this applies automatically to all use cases.

Platform Approach Rather Than Individual Solutions

The common denominator: many requirements repeat themselves across all AI applications. When each application solves them separately, inconsistencies arise in model access, cost control, data integration, quality measurement and observability. The platform approach bundles these topics as infrastructure building blocks so that new use cases can be built faster and operated consistently.

Within this framework I also position the bbv AI Hub: as an "opinionated" assembly of modules (for example gateway, RAG stack, ingestion, evaluation, observability, PII protection) plus integrations that are needed again and again in projects, not as a single feature but as a stable foundation for multiple AI solutions.