Lesson 11 of 209 min read

Documentation and the Audit Trail

In regulated AI, a control you cannot evidence is a control that does not exist. This part covers what to document, the difference between documentation and a live audit trail, and how to generate evidence as a by-product of operation.

Documentation and the Audit Trail

There is an uncomfortable truth at the centre of regulated AI: the regulator does not experience your system, only its records. Your model may be brilliant, your controls rigorous, your team conscientious — but if you cannot evidence any of it, none of it counts when you are asked to demonstrate compliance. This is the principle that governs this part: a control that works but cannot be proven to have worked is, for regulatory purposes, a control that does not exist. Documentation and the audit trail are how the invisible work of governance becomes visible, durable, and defensible.

Two kinds of evidence

It helps to distinguish two related but different things, because teams often build one and neglect the other.

Documentation: the static record

Documentation is the deliberately authored account of what a system is, how it was built, and how it is governed — the model's design, its data, its validation, its risk classification, its controls, its owners. It is relatively static, updated at milestones, and tells the story of the system. Documentation answers "what is this system and how was it built responsibly?"

The audit trail: the living log

The audit trail is the automatically generated, continuously growing record of what the system actually did — every decision, the data behind it, the model version, any human involvement, every control that fired. It is dynamic, machine-generated, and tells the history of the system's operation. The audit trail answers "what exactly happened, for this specific decision, on this specific date?"

You need both. Documentation without an audit trail is a beautiful description of a system you cannot prove behaved as described. An audit trail without documentation is a mountain of logs no one can interpret. Together they let you answer both the general question ("is this system well-governed?") and the specific one ("justify what it did to this person").

What to document

For a high-risk system, the documentation set should let a knowledgeable outsider understand and assess it without recourse to the original team. At minimum it covers:

The standardised expression of much of this is sometimes called a model card or system fact sheet — a consistent template applied to every system so nothing is forgotten and comparison is easy.

The discipline of honest documentation

Documentation is only useful if it is honest, and the strongest temptation is to write it to impress rather than to inform — to omit limitations, gloss over weaknesses, and present a flattering account. This is a profound mistake. A regulator who finds that your documentation overstated your controls trusts nothing else you say, and the limitations section is, perversely, the most credibility-building part of the whole set. Documenting what a system cannot do, and where it should not be trusted, demonstrates exactly the clear-eyed risk awareness regulators look for. The most dangerous document is the one that claims a system has no weaknesses; every real system has them, and pretending otherwise signals either dishonesty or a lack of understanding, both fatal to trust.

The limitations section is not the embarrassing part of the documentation. It is the part that proves you understand your own system.

Document as you go

The single most important practical habit is to document contemporaneously — as decisions are made, not reconstructed afterward. Documentation written months later, from memory, before an audit, is both painful to produce and unreliable: memory fades, people leave, and the rationale for choices evaporates. Worse, reconstructed documentation has a quality regulators are skilled at detecting — it reads like a justification rather than a record. Capturing decisions and their reasons as they happen costs little in the moment and saves enormous effort and credibility later. This is a cultural and tooling problem as much as a discipline one: make it easy to record decisions in the flow of work, and people will.

The audit trail as a by-product

The audit trail should not be a thing you build separately and remember to populate; it should fall out of the system's normal operation automatically. Every part of this course has pointed at it: capturing decision inputs (Part 6), recording model versions (Part 6), logging human overrides and their reasons (Part 10), tracking data lineage (Part 7), recording control executions. Done well, the audit trail is simply the accumulated exhaust of a well-instrumented system. The defining property is automaticity: if generating the trail depends on humans remembering to log things, it will be incomplete exactly when you need it most. Instrumentation that records as a side effect of doing the work is reliable; manual logging is not.

A good audit trail has further properties worth designing for: it should be tamper-evident, so its integrity can be trusted; complete, covering every consequential decision rather than a sample; retained for as long as obligations require, which can be years; and queryable, so that reconstructing a specific decision is a search, not an excavation.

When the trail pays off

The value of all this becomes vivid at the moments of pressure. A regulator opens a thematic review and asks for evidence of how your high-risk models are governed. A customer complains that a decision was unfair and threatens legal action. An internal incident requires you to determine which decisions a faulty model affected. In each case, the firm with documentation and a complete audit trail answers calmly, with evidence, in hours. The firm without them scrambles for weeks, produces partial and reconstructed answers, and signals — accurately — that its governance was never as real as it claimed. As the next part on validation will show, the same evidence that satisfies an audit is what lets you validate and trust the system in the first place. Evidence is not the tax you pay for governance; it is the substance of it.

The two-in-the-morning test

A useful way to judge whether your documentation and audit trail are adequate is to imagine the moment they will be tested. A regulator has opened an inquiry, or a customer has filed a complaint that is heading to court, and someone asks you to justify a specific decision your system made fourteen months ago. Can you, quickly and without heroics, produce: which model version made it, on what inputs, under what configuration, what the decision was, whether a human was involved and what they concluded, and how that decision sits within a system that was classified, validated, and governed appropriately? If the answer is yes, your evidence is real. If answering requires tracking down the original data scientist, excavating a notebook, and reconstructing from fragments, your evidence exists only in principle — and "in principle" is worthless on a regulator's timeline. Designing to pass this test, for any decision and at scale, is the concrete target that documentation and the audit trail aim at.

You are not building documentation for today. You are building it for the worst day, years from now, when someone demands an answer you can no longer reconstruct from memory.

Why reconstructed documentation reads as reconstructed

Regulators and auditors develop a practised eye for documentation written after the fact, and it is worth understanding why contemporaneous records are so much more credible. Documentation produced as decisions are made captures the genuine reasoning of the moment — the alternatives weighed, the trade-offs accepted, the uncertainties acknowledged. Documentation reconstructed before an audit captures something different: a justification, written by people who now know how things turned out, shaped to present the system favourably. The two read differently. Contemporaneous records have the texture of real decision-making, including its doubts and dead ends; reconstructed records have the suspicious smoothness of a story told backwards. Beyond credibility, reconstructed documentation is simply less reliable — memory fades, people leave, and the rationale for choices genuinely evaporates. This is why the habit of documenting as you go is worth more than any template: it is the only way to produce records that are both accurate and believable.

Designing the audit trail as exhaust

The defining property of a trustworthy audit trail is that it is generated automatically, as a side effect of the system operating, rather than assembled by people remembering to log things. We can be concrete about what this means architecturally. Every decision the system makes should, without anyone choosing to record it, leave a complete entry: the inputs, the model version, the configuration, the output, and any human action. Every control that runs should log that it ran and what it found. Every data flow should be traced by the pipeline itself. The trail should be tamper-evident, so its integrity can be relied upon; complete, covering every consequential decision rather than a convenient sample; retained for as long as obligations require, often years; and queryable, so reconstructing any decision is a search rather than an excavation. Built this way, the audit trail is not a burden the team carries but the natural exhaust of a well-instrumented system — and the same instrumentation that produces it is what makes monitoring, explanation, and incident response possible.

Documentation as a tool for thinking, not just proving

A final, underappreciated point: good documentation is not only evidence for outsiders; it is a discipline that improves the system itself. The act of writing down why a system was built a certain way, what its limitations are, and how its controls work forces a clarity that purely verbal understanding never demands. Teams that document honestly as they go frequently discover gaps in their own reasoning — a limitation they had not fully confronted, a control that does not actually address the obligation it was meant to, an assumption nobody had stated. In this sense the limitations section, which we earlier called the most credibility-building part of the documentation, is also the most useful internally: enumerating what the system cannot do and where it should not be trusted is exactly the analysis that prevents it from being misused. Documentation written only to impress an auditor misses this benefit entirely; documentation written to genuinely understand and convey the system delivers both the evidence and the insight.


In the next part: testing and validation — the independent assessment that determines whether an AI system is fit to deploy and stays fit over time.


← Previous lesson  ·  Next lesson →