Third-Party and Foundation-Model Risk (Building Regulated AI: From Principles to Production)

For most of this course we have implicitly assumed that you build your own models, with access to their data, design, and internals. That assumption is fast becoming the exception. More and more, the model at the heart of a system is one you did not build and cannot fully inspect — a vendor's proprietary model, or a large foundation model accessed through an interface, whose training data, architecture, and behaviour are largely opaque even to its own makers. This shift does not reduce your responsibility; a regulator holds you accountable for the decisions your system makes, regardless of whose model made them. But it changes the shape of governance, because the tools that assume access to internals no longer apply. This part is about governing the models you depend on but do not own.

You cannot outsource accountability

The single most important principle is that accountability does not transfer with the model. When you build a system on a third-party or foundation model, you remain accountable for what that system does. "The vendor's model made the decision" is no more a defence than "the algorithm did it". You chose to rely on that model, for that purpose, in that context, and you own that choice and its consequences. This reframing is essential because it dispels the comforting but false notion that using an external model offloads risk. It offloads capability, not responsibility — and capability you do not understand is responsibility you cannot easily discharge.

You can buy a model. You cannot buy your way out of accountability for the decisions it makes on your behalf.

What makes external models hard

Several properties of third-party and foundation models complicate the governance this course has described:

Opacity. You may know little about how the model was built, what data it was trained on, or how it behaves across cases. The validation and explainability disciplines that assume access to internals must adapt.
Unknown provenance. You often cannot verify the lawfulness, quality, or representativeness of the data the model was trained on — the very data-governance assurances of Part 7 that you would demand of your own data.
Generality and misfit. Foundation models are built for general use, not your specific purpose. A general capability applied to a regulated decision may be poorly suited in ways that are not obvious until it fails on the cases that matter to you.
Shared and shifting behaviour. A model used by many customers may change beneath you as the provider updates it, altering your system's behaviour without a change on your side — a change-management problem you do not control.
Concentration risk. When many systems depend on the same external model, a flaw or outage in that model propagates widely, creating systemic exposure that regulators are increasingly attentive to.

Due diligence before you depend

Governing an external model begins before you adopt it, with due diligence proportionate to the risk of the decisions it will drive. The aim is to understand, as far as the provider allows, what you are relying on: what the model is designed for and its stated limitations; what the provider can tell you about its data, testing, and known weaknesses, especially around fairness and robustness; what assurances, documentation, and certifications the provider offers; how the provider manages changes, security, and incidents; and how stable and reputable the provider is as a long-term dependency. Where a provider is unwilling or unable to give you enough to govern a high-risk use, that itself is a finding — a model you cannot get comfortable with should not sit behind a decision you are accountable for.

Contractual and operational control

Some of the control you cannot get technically, you must get contractually and operationally. Contracts with model providers should, for high-risk uses, address the things your governance needs: rights to information and audit, notice of material changes to the model, security and incident commitments, data-handling and provenance assurances, and clear allocation of responsibilities. Contractual control is imperfect — it does not let you inspect the model directly — but it establishes obligations you can hold the provider to and evidence you can show a regulator. Operationally, you should also retain the ability to detect when a dependency has changed or degraded (monitoring, Part 17) and to fall back or switch if a provider fails you, so that a single external model is not an unmanaged single point of failure.

Validating what you cannot inspect

Validation (Part 12) does not disappear when you cannot see inside the model; it shifts its weight. Unable to assess internals, you assess behaviour and fit for your context:

Behavioural testing. Probe the model extensively on cases representative of your actual use, including edge cases and adversarial ones, to map how it performs and where it fails for your purpose — not the provider's general benchmarks.
Fairness testing in your context. Run the disparate-impact analysis of Part 9 on the model as you use it, because a model's fairness depends on the population and decision it is applied to, which the provider cannot have tested for you.
Wrapping and constraining. Build controls around the external model — input validation, output checks, guardrails, human oversight — so that your system's behaviour is governed even though the model at its core is not fully knowable. You govern the system you built around the model, even where you cannot govern the model itself.
Ongoing revalidation. Because the external model may change beneath you, behavioural validation must be ongoing, re-run when the provider updates the model or when your monitoring suggests its behaviour has shifted.

Foundation models and their particular challenges

Large foundation models deserve specific mention, because their generality and their mode of use raise distinctive issues. Their broad, open-ended capabilities make their behaviour harder to bound and test exhaustively — the space of things they might do is vast. Systems built on them, especially agentic ones, inherit the security concerns of Part 15, prompt injection foremost among them. And their outputs can be fluent and confident while being wrong, a combination that is especially hazardous in regulated decisions where confident wrongness can be mistaken for reliability. None of this means foundation models cannot be used in regulated settings; it means their use must be bounded, wrapped in controls, validated behaviourally for the specific purpose, and overseen with particular care — exactly the disciplines this course has built, applied to a component you understand least.

The dependency you must own

The throughline is that depending on someone else's model is a deliberate risk decision that you own and must govern, not a way to make risk someone else's problem. The model may be external; the accountability, the validation in your context, the controls around it, the monitoring of its behaviour, and the consequences of its failures are all yours. Treated that way — with due diligence, contractual control, behavioural validation, and surrounding guardrails — third-party and foundation models can be used responsibly even in high-stakes settings. Treated as a black box you trust because someone reputable built it, they are an accountability gap waiting to be exposed. The final part now draws every thread of this course together into the operating model that makes all of it work in practice.

The accountability that stays with you

It is worth pressing on the central principle, because it is the one teams most want to wish away: when you build a system on a model someone else created, the accountability for that system's decisions stays with you. The comforting intuition is that using a reputable vendor's model, or a well-known foundation model, transfers some of the risk to its maker — that if the model is flawed, that is their problem. It is not, at least not in the way that matters. A regulator holds the organisation that deployed the system accountable for the decisions it made, regardless of whose model made them. You chose to rely on that model, for that purpose, in that context, and you own that choice. What an external model offloads is capability — you did not have to build it — not responsibility for what it does on your behalf. And capability you do not fully understand is responsibility that is harder, not easier, to discharge, because you must govern a component you cannot fully see. This is why third-party and foundation models do not reduce your governance burden so much as change its shape, shifting weight from inspecting internals to controlling behaviour and context.

You can buy the model. You cannot buy your way out of owning the decisions it makes for you. The vendor supplies capability; the accountability stays at your door.

Validating a black box you did not build

Traditional validation assumes access to a model's internals, which you may not have for a third-party or foundation model — but this does not excuse you from validation; it redirects it toward behaviour and fit for your specific context. Unable to inspect how the model works, you assess what it does where you use it. Behavioural testing probes the model extensively on cases representative of your actual use, including edge cases and adversarial ones, to map how it performs and fails for your purpose rather than relying on the provider's general benchmarks, which were not run on your population or your decision. Fairness testing must be done in your context, because a model's fairness depends on the population and decision it is applied to, which the provider cannot have tested for you. Wrapping and constraining builds controls around the external model — input validation, output checks, guardrails, human oversight — so that the system you assemble behaves acceptably even though the model at its core is not fully knowable; you govern the system you built around the model even where you cannot govern the model itself. And because the external model may change beneath you when the provider updates it, this validation must be ongoing rather than one-time. The throughline is that opacity changes the method of validation but never removes the obligation: you are still accountable for establishing that the system is fit, by whatever means the opacity leaves available.

Foundation models and confident wrongness

Large foundation models raise a distinctive hazard worth isolating: they can be fluent, articulate, and confident while being wrong, and that combination is especially dangerous in regulated decisions. A model that hedges and signals uncertainty invites scrutiny; a model that produces a polished, assured answer invites trust — and when the assured answer is wrong, the very confidence that makes it persuasive makes the error more likely to be acted upon. In a regulated context, confident wrongness can slip past the human oversight meant to catch it, precisely because it does not look like the kind of output that warrants a second look. The generality of foundation models compounds this: built for open-ended use, their behaviour is harder to bound and test exhaustively, and the space of things they might produce is vast. None of this rules out foundation models in regulated settings, but it sharpens the requirements: their use must be bounded to specific purposes, wrapped in controls that check their outputs rather than trusting them, validated behaviourally for the actual task, and overseen with particular care by humans alerted to the specific risk that a confident answer is not necessarily a correct one. Systems built on foundation models, especially agentic ones, also inherit the security concerns of the previous part — prompt injection foremost — which is another reason their autonomy and reach must be deliberately constrained.

Owning the dependency

The practical posture toward external models follows from all of this: depending on someone else's model is a deliberate risk decision that you own and must govern actively, not a way to make risk someone else's problem. Before you depend, conduct due diligence proportionate to the stakes — understanding the model's intended use and limitations, what the provider can tell you about its data and testing, what assurances and documentation it offers, and how it manages change, security, and incidents — and treat a provider that cannot give you enough to govern a high-risk use as a finding in itself. Secure through contracts what you cannot secure technically: rights to information, notice of material changes, security and incident commitments, and clear allocation of responsibilities. Monitor the dependency for change and degradation, and retain the ability to fall back or switch so that a single external model is not an unmanaged single point of failure, including the concentration risk that arises when many of your systems, or many firms across the market, lean on the same model. Governed this way, third-party and foundation models can be used responsibly even in high-stakes settings; treated as a trusted black box because someone reputable built it, they are an accountability gap waiting to be exposed. The model may be external, but the validation in your context, the controls around it, the monitoring of its behaviour, and the consequences of its failure are all yours to own.

In the next part: the operating model — assembling classification, governance, the technical disciplines, and operations into a single, coherent way of building and running regulated AI.

← Previous lesson · Next lesson →