Designing for Explainability from Day One (Building Regulated AI: From Principles to Production)

Of all the demands regulated AI places on a system, explainability is the one teams most often try, and fail, to retrofit. They build an opaque system, ship it, and only then discover that a regulator expects reasons, a customer expects an explanation, and a court expects justification — none of which the system can produce. By then the cost of adding explainability is enormous, and sometimes the architecture simply will not allow it. The lesson is unambiguous: explainability is a design property, decided at the start, not a feature bolted on at the end.

Why explainability is non-negotiable

Explainability is demanded by every layer of the landscape from Part 2, for overlapping reasons. Regulators require it to supervise. Sectoral law often requires that affected people receive specific reasons — a declined borrower must be told why. Data protection grants individuals a right to meaningful information about the logic of automated decisions. And beyond compliance, explainability is what makes a system governable at all: you cannot validate, monitor, debug, or improve a system whose decisions you cannot understand. Opacity is not just a legal problem; it is an operational one.

A decision you cannot explain is a decision you cannot defend, cannot debug, and cannot improve. Explainability is the precondition for everything else.

Explanation is audience-relative

The first conceptual trap is treating "explainability" as one thing. It is not. An explanation is information that helps a particular audience understand a decision, and different audiences need radically different explanations. Building one and assuming it serves all is a common and costly error.

The affected individual

A person denied a loan needs a plain-language account of the main factors that drove the decision and, ideally, what they could change. They do not need feature attributions or model internals; they need actionable, comprehensible reasons. The standard here is human understandability, not technical completeness.

The internal operator

A staff member reviewing or overriding a decision needs enough insight to judge whether it looks right — the key drivers, the model's confidence, anything anomalous about the case. Their explanation is richer than the customer's but still oriented to judgement, not mathematics.

The validator and regulator

An independent validator or supervisor needs the deepest explanation: how the model behaves across cases, what it relies on, how it fails, and whether its reasoning is sound and stable. This audience can handle technical detail and demands it.

The developer

The builders need the most granular view of all, for debugging and improvement. This is where the heaviest technical interpretability tooling lives.

A serious system serves all four, which means explainability is not a single artefact but a layered capability.

Two routes to explainability

Broadly, there are two strategies, and the choice between them is one of the most consequential design decisions you will make.

Intrinsically interpretable models

The first route is to use models that are transparent by construction — linear and logistic models, decision trees, rule sets, generalised additive models. With these, the explanation is the model; you can read directly what drives a decision. The trade-off is presumed to be accuracy, but this is frequently overstated. For many structured, tabular decision problems — exactly the kind common in regulated industries — a well-built interpretable model performs comparably to an opaque one. Reaching for a black box should be a justified choice, not a reflex. When the stakes are high and the data is tabular, start by asking whether an interpretable model will do; often it will, and you have solved explainability for free.

Post-hoc explanation of opaque models

The second route is to use an opaque model and explain it after the fact with separate techniques — local attribution methods that estimate each feature's contribution to a specific decision, global methods that characterise overall behaviour, counterfactual methods that identify what would have changed the outcome, and example-based methods that surface similar past cases. These tools are valuable, but carry a serious caveat: a post-hoc explanation is an approximation of the model's reasoning, not the reasoning itself. Two techniques can give different explanations for the same decision, and an explanation that is convincing but unfaithful is worse than none, because it creates false confidence. Post-hoc explanations must themselves be validated for fidelity.

Architectural choices that preserve explainability

Beyond the model itself, system architecture determines whether decisions stay reconstructable. Several choices matter.

Capture inputs at decision time. To explain a decision later, you must know exactly what the model saw when it made it. Persisting the precise input feature values for every decision is the single most important enabler of after-the-fact explanation. Without it, you are reconstructing from memory.
Version everything. The explanation of a decision depends on which model version made it. Recording the model version, and being able to retrieve that exact version, is essential — otherwise you explain today's model for yesterday's decision.
Keep humans in the explanation loop. Where a human reviewed or overrode the decision, capture their reasoning too. The full explanation of a decision often includes the human judgement layered on top of the model.
Separate decision from action. Architectures that record the decision and its basis before acting on it preserve a clean explanatory record; architectures that act first and reconstruct later lose information.

The reconstruction test

A useful, concrete standard to design against is the reconstruction test: could a competent colleague who has never seen a particular case reconstruct exactly why the system did what it did, months later, using only the records the system produced? If yes, your explainability is real. If reconstructing a decision requires the original data scientist, a notebook, and an afternoon of detective work, your explainability does not yet exist — you have the capacity to explain in principle but not the system to explain in practice. The distinction matters enormously under regulatory pressure, where explanations are demanded on a timeline and at scale.

Explainability and its limits

A mature view acknowledges that explainability has limits and that honesty about them is itself a control. Some genuinely complex models resist faithful explanation; some explanations are necessarily simplifications. The right response is not to pretend otherwise but to factor it into classification and design: if a decision is high-stakes and the only adequate model is one you cannot faithfully explain, that tension is a risk to be surfaced and resolved — perhaps by choosing a more interpretable model, perhaps by adding human judgement, perhaps by not deploying at all. What you must not do is ship an unexplainable high-stakes system and hope the question never comes. It always comes.

A worked example: explaining a declined application

Imagine a model has declined a loan application, and trace what each audience actually needs — it shows why one explanation never suffices.

The applicant needs to be told, in plain language, the principal reasons: that the decision turned mainly on a short credit history and a high existing debt burden, and ideally what would change the outcome. They neither want nor can use feature attributions. The support agent handling the applicant's call needs a little more: the same key drivers, the model's confidence, and a flag if anything about the case is unusual, so they can judge whether to escalate. The validator reviewing the model needs to know whether "short credit history" is driving declines in a way that disproportionately affects younger or recently-arrived applicants — a fairness question invisible to the other audiences. The developer debugging an odd decline needs the full feature vector, the model version, and the attribution detail to find out why the model behaved unexpectedly. One decision, four explanations, each faithful to the same underlying reasoning but pitched to a different need. A system that produces only one of these has solved explainability for one audience and failed the rest.

The fidelity trap in post-hoc explanation

When teams use opaque models with post-hoc explanation, they often treat the explanation as if it were the model's actual reasoning. It is not — it is an approximation, generated by a separate method, and it can be wrong. This fidelity gap is one of the most dangerous and least appreciated risks in explainability, because an unfaithful explanation is worse than no explanation: it manufactures false confidence in a decision and can mislead the very oversight meant to catch errors.

Two consequences follow. First, post-hoc explanations must themselves be validated for fidelity — you must have evidence that the explanation method genuinely reflects what the model does, not merely that it produces plausible-looking outputs. Second, a sobering test: if two reputable explanation methods, applied to the same decision, disagree about why the model decided as it did, at least one is unfaithful, and you may not know which. Where the stakes are high and fidelity cannot be assured, that uncertainty is itself a reason to prefer an intrinsically interpretable model, where the explanation and the reasoning are the same thing.

A confident explanation that does not reflect the model's actual reasoning is not a control — it is a liability disguised as one.

The interpretability-first discipline

Because of all this, mature teams adopt a default that inverts the common habit: reach for an interpretable model first, and justify reaching for an opaque one. For the structured, tabular decision problems that dominate regulated industries — credit, claims, eligibility, pricing — well-built interpretable models frequently match opaque ones closely enough that the marginal accuracy does not justify the explainability cost. The discipline is to make the choice deliberately: try the interpretable approach, measure the genuine performance gap, and only accept an opaque model when that gap is both real and worth the burden of post-hoc explanation, fidelity validation, and the residual risk that some decisions will resist faithful explanation. Framed this way, "we used a black box" becomes a documented, justified decision rather than a reflex — and very often the honest conclusion is that the interpretable model was good enough, and explainability came for free.

Designing the explanation pipeline

Explainability is not only about the model; it is about the system that captures and serves explanations. The practical foundations bear repeating as a checklist of design commitments: capture the exact inputs at decision time, so a decision can be explained from what the model actually saw rather than reconstructed from memory; record the model version, so you explain the right model for the right decision; capture any human judgement layered on top; and store all of it durably enough to answer questions months or years later, at the scale and on the timeline a regulator or court will demand. An explanation capability that works for one decision in a demo but cannot be produced for ten thousand past decisions on request is not yet a real capability.

In the next part: data governance and lineage — the foundation beneath explainability and fairness alike, and the discipline of knowing exactly where every input came from and how it was transformed.

← Previous lesson · Next lesson →