Model Risk Management for AI (Building Regulated AI: From Principles to Production)

Regulated AI did not arrive in a vacuum. Financial institutions have governed quantitative models for decades under a discipline called model risk management (MRM), shaped by hard-won lessons and codified in supervisory guidance after models contributed to real crises. Anyone building regulated AI inherits this discipline whether they know it or not, because regulators reason about AI through its lens. Understanding MRM gives you a proven scaffold — and understanding where AI strains that scaffold tells you where the new, hard work lies.

What model risk actually is

Model risk is the risk of harm arising from decisions based on model outputs that are wrong or misused. It comes from two sources: the model may be fundamentally flawed — built on bad assumptions, poor data, or faulty mathematics — or it may be used incorrectly — applied to cases it was never designed for, or trusted beyond its competence. Note that both sources matter. A perfect model used wrongly produces model risk just as surely as a flawed model used as intended. This dual framing is one of MRM's enduring contributions, and it maps directly onto AI: the cleanest model in the world becomes a liability the moment it is pointed at a population it never saw in training.

Model risk is not just "is the model right?" It is "is the model right, and is it being used for what it is right about?"

The model lifecycle

MRM organises a model's existence into a lifecycle, with controls at each stage. The stages are worth walking through, because each carries over to AI with a twist.

Development

The model is conceived, data is gathered, and the model is built and documented. MRM demands that development be sound and, crucially, documented as it happens — the assumptions, the data choices, the alternatives considered and rejected. For AI, this stage expands enormously, because the data is larger, the provenance harder to track, and the modelling choices (architecture, features, hyperparameters) more numerous. The documentation burden grows accordingly.

Validation

Before a model is used, an independent party assesses whether it is fit for purpose. This is the heart of MRM and the subject of a later part in its own right. The principle is independence: the validator is not the developer, and brings a sceptical, adversarial mindset. For AI, validation must grapple with phenomena traditional models lacked — opacity, high-dimensional inputs, and behaviour that shifts with data.

Implementation

The validated model is put into production. MRM cares that the deployed system matches the validated one — that no silent changes crept in between approval and live operation. For AI this is sharper, because the gap between a model in a notebook and a model in a production pipeline is wide, and small discrepancies in data handling can change behaviour materially.

Ongoing monitoring and use

The model is watched in production for continued fitness. MRM's insistence on ongoing monitoring is exactly the "lifecycle, not launch" mindset from Part 1. For AI it is non-negotiable, because models drift faster and more silently than traditional ones.

Retirement

Eventually the model is decommissioned or replaced. MRM treats retirement as a controlled event — you confirm what replaces it, ensure nothing still depends on it, and preserve its records. Orphaned models that quietly keep running long after anyone owns them are a classic source of risk.

The three pillars MRM gives you

Three of MRM's mechanisms transfer to AI almost unchanged, and they are worth adopting wholesale.

Independent validation

The requirement that someone independent of the builder critically assess a model before and during its use is MRM's central control. It works because builders are too close to their own work to see its flaws, and because independence creates accountability. AI does not change this principle; it only raises the bar for what the validator must be capable of assessing.

The model inventory

You cannot govern what you cannot see, and the model inventory is how institutions see their models — a complete, maintained register of every model in use, its purpose, owner, risk tier, validation status, and dependencies. For AI, the inventory is both more important and harder to maintain, because models proliferate, embed in larger systems, and sometimes get built outside the official process. A serious AI programme begins by ensuring every system is in the inventory; the systems that hurt you are usually the ones nobody knew were there.

Effective challenge

MRM enshrines the idea of effective challenge — critical scrutiny by competent, independent, empowered parties. It is not enough to have a validator; the validator must actually be able to challenge, and the organisation must take the challenge seriously. Effective challenge is the cultural core of MRM, and it is precisely what stops governance from collapsing into theatre.

Where AI breaks the assumptions

MRM is a gift, but it was built for a world of smaller, more interpretable, more stable models. AI strains several of its assumptions, and the strain is where the genuinely new work concentrates.

Interpretability. Traditional models were often transparent by construction — a logistic regression's coefficients tell you what it is doing. Many AI models are opaque, which makes both validation and explanation far harder. A whole part of this course is devoted to designing for explainability precisely because MRM assumed it for free.
Data scale and provenance. MRM assumed you could account for your data. Modern AI trains on volumes so large that tracking the provenance and quality of every input is a serious engineering problem in itself — the subject of the data-governance parts ahead.
Drift and adaptivity. Traditional models were relatively stable; AI models can degrade quickly as the world shifts, and some systems even learn online. Monitoring must be more intensive and more automated than MRM originally contemplated.
Emergent and agentic behaviour. The most modern systems plan and act over multiple steps in ways their builders did not explicitly program. MRM has no native concept of this, which is why agentic AI gets its own extended treatment later.
Third-party and foundation models. Increasingly the model at the heart of your system was built by someone else and is opaque even to you. MRM's assumption that you understand your own model breaks down, and vendor-risk discipline must fill the gap.

Adopt the scaffold, extend it deliberately

The practical lesson is to take MRM as your starting scaffold rather than inventing governance from scratch. Its lifecycle, its independent validation, its inventory, and its culture of effective challenge are battle-tested and regulator-recognised — building on them buys instant credibility. Then extend the scaffold deliberately at the points where AI breaks it: richer explainability, industrial-strength data lineage, more intensive monitoring, and new disciplines for agentic and third-party systems. The rest of this course is, in a sense, that deliberate extension — taking each strained assumption in turn and building the additional control the strain demands.

A lesson written in crisis

It is worth remembering why model risk management exists at all, because the origin explains its instincts. The discipline was forged in the aftermath of episodes where reliance on flawed or misused models contributed to serious financial harm — models whose assumptions quietly stopped holding, whose limitations were poorly understood by those who relied on them, and whose risks no independent party had effectively challenged. The supervisory response was not to ban models but to insist that institutions govern them: know what models they have, validate them independently, understand their limitations, and own the risk they carry. Every instinct of MRM — the inventory, the independent validation, the insistence on effective challenge — is a direct response to a way models had previously failed. Inheriting MRM means inheriting those hard-won lessons for free, which is precisely why building on it rather than reinventing governance is such a sound move.

The model inventory as the foundation

Of all MRM's mechanisms, the model inventory deserves special emphasis, because it is both the most basic and the most frequently incomplete. The principle is simple: you cannot govern what you cannot see, so you must maintain a complete register of every model in use. The difficulty is that AI models proliferate and hide. They get embedded inside larger systems, built by teams outside the official process, spun up in experiments that quietly become production, and inherited through acquisitions. The models that cause the worst surprises are almost always the ones nobody knew were there — the "shadow" models operating outside governance entirely.

A serious AI programme therefore begins not with sophisticated controls but with the unglamorous work of discovery: finding every model, including the unofficial ones, and getting them into the inventory with their purpose, owner, risk tier, validation status, and dependencies recorded. This is harder than it sounds and never quite finished, which is why mature programmes treat inventory completeness as an ongoing discipline with active discovery, not a one-time data-gathering exercise.

The first question a capable examiner asks is "show me your model inventory." The second is "how do you know it's complete?" The honest answer to the second is what separates real governance from the appearance of it.

Effective challenge: the cultural core

MRM's most important contribution may be the least tangible: the principle of effective challenge. It is not enough to have a validator, a committee, or an audit function; those parties must actually challenge, be competent to do so, and be empowered to make their challenge stick. Effective challenge has three preconditions worth stating plainly. It requires competence — the challenger must understand the system well enough to find its real flaws, which for AI sets a high bar. It requires independence — the challenger must have no stake in the outcome and no fear of delivering bad news. And it requires standing — the organisation must take the challenge seriously and act on it, rather than noting it and proceeding regardless. Where any of these is missing, challenge degrades into theatre, and the whole apparatus of governance becomes a costly performance. Where all three are present, effective challenge is the mechanism that actually catches problems before they become harms. It is, in the end, less a process than a culture — the institutional willingness to have one's work genuinely questioned and to change course when the questioning reveals a flaw.

In the next part: designing for explainability from day one — why opacity is a design choice as much as a technical fact, and how to build systems whose decisions can be reconstructed and explained.

← Previous lesson · Next lesson →