Lesson 13 of 2010 min read

Agentic AI: Autonomy Under Guardrails

Agentic systems plan, use tools, and act over multiple steps with limited supervision. This part covers why autonomy multiplies both value and risk, and how to bound an agent so it is useful inside a boundary you can define and defend.

Agentic AI: Autonomy Under Guardrails

So far this course has largely concerned models that make a prediction which a system or person then acts on. A newer and more powerful class of system goes further: agentic AI that plans, decides, and acts over multiple steps, often calling tools and other systems, with limited human supervision between goal and outcome. An agent does not just score a transaction; it can investigate it, gather more information, take remediating action, and move on — autonomously. This autonomy is exactly what makes agents so valuable and exactly what makes risk functions anxious. This part is about resolving that tension: not by forbidding autonomy, but by bounding it.

What makes agentic AI different

Several properties distinguish agents from the predictive systems we have discussed, and each amplifies risk.

A predictive model can be wrong. An agent can be wrong and then act on it, repeatedly, faster than anyone is watching.

The wrong response and the right one

Faced with this, two instincts are both wrong. The first is to ban agentic AI as too risky, forfeiting its substantial value and ceding ground to competitors who will not be so cautious. The second is to deploy it with the same light governance one might apply to a low-stakes predictive model, ignoring how much its autonomy changes the risk. The mature response is neither: it is to let the agent act freely inside a boundary you have deliberately defined, can monitor, and can prove to a regulator. Autonomy is not the enemy; unbounded autonomy is. The whole discipline of agentic governance is the construction and enforcement of that boundary.

Defining the boundary

An agent's boundary has several dimensions, and each must be defined explicitly rather than left implicit.

Allowed actions

The most fundamental boundary is what the agent is permitted to do. The safe posture is an explicit allow-list: the agent may use only these specific tools and take only these specific actions, and everything else is forbidden by default. "Default-deny" is far safer than "default-allow with a list of prohibitions", because you cannot enumerate every harmful action in advance, but you can enumerate the helpful ones you intend.

Limits on magnitude and reversibility

Within its allowed actions, the agent should face limits calibrated to consequence: caps on the value it can move, the volume of actions it can take, and — especially — the irreversibility of what it does. High-magnitude and irreversible actions should sit outside the agent's autonomous authority entirely, routed to a human checkpoint. An agent that can autonomously do something it cannot undo is an agent that can autonomously cause permanent harm.

Scope of access

The agent's reach — the data and systems it can touch — should be confined to what its task genuinely requires, and enforced through real permissions rather than instructions. This is the subject of the next part, but the principle belongs here: an agent's blast radius is bounded by what it can access, so access should be minimal by design.

Plan, then act — visibly

A powerful architectural pattern for agentic safety is to separate planning from acting, and to make the plan inspectable before consequential actions execute. An agent that first produces a plan — "I intend to do A, then B, then C" — creates an opportunity to check that plan against policy before anything irreversible happens, and leaves a clean record of intent for the audit trail. The plan becomes both a safety gate and an explanatory artefact. For the highest-stakes actions, the plan can require human approval before execution; for lower-stakes ones, it can be checked automatically against rules. Either way, visible intent beats opaque action.

Containing the blast radius

However well you bound an agent, you should design on the assumption that it will sometimes be wrong, and ensure that a wrong step is survivable. Blast-radius containment means engineering the system so that errors are recoverable and bounded:

The guiding idea is that an agent is safe in proportion to how cheaply you can undo what it does. Invest in recoverability and you can grant more autonomy with less anxiety; neglect it and even a well-bounded agent is a liability.

Governing agents within the framework

Agentic systems do not need a separate governance universe; they need the framework of this course applied with the dial turned up. They must be classified — and their autonomy and irreversibility push them toward higher tiers (Part 3). They need clear ownership, with an accountable human who can stop them (Part 4). They need validation, which for agents leans heavily on behavioural testing across many scenarios, including adversarial ones, because their multi-step behaviour cannot be validated by checking single predictions (Part 12). They need explainability — the plan-then-act pattern is partly an explainability mechanism. And they need intensive monitoring, because their behaviour can drift in ways that single-prediction monitoring would miss. Agentic AI is, in a sense, a stress test of the entire framework: everything that was important for predictive systems becomes critical when the system can act on its own.

Give an agent the freedom to be genuinely useful inside a boundary you could describe to a regulator on a single page — and prove you enforce.

How small errors become large harms

The defining risk of agentic systems is that errors compound across steps, and a concrete picture makes the danger vivid. Imagine an agent tasked with resolving a billing discrepancy. It misreads one figure early — a small, ordinary model error of the kind any system makes occasionally. But because it acts on that misreading, the next step builds on the error: it concludes the account is in arrears, then issues a notice, then restricts the account, then escalates to collections — each step locally reasonable given the last, each compounding the original mistake into a cascade that ends with a wronged customer and a mess to unwind. In a single-prediction system, that first error would have produced one wrong output that a downstream check might catch. In an agentic system, the error propagates through a chain of actions before anyone is watching, and the harm at the end bears no proportion to the smallness of the mistake at the start. This compounding is why agentic systems demand checkpoints, blast-radius containment, and visible plans: you are not guarding against a single wrong answer but against a wrong answer that acts on itself.

In a predictive system, an error is a wrong answer. In an agentic system, an error is the first move in a sequence that can amplify it beyond recognition before anyone intervenes.

The plan-then-act pattern in depth

Separating planning from action is one of the most powerful safety patterns for agents, and it repays a closer look at why it works. When an agent first produces an explicit plan — "I intend to do A, then B, then C" — before executing anything consequential, it creates three distinct benefits at once. It opens a checkpoint: the plan can be reviewed, by a rule or a human, before any irreversible action occurs, catching a flawed plan while it is still only a proposal. It creates an explanatory artefact: the recorded plan is a clear statement of the agent's intent, invaluable for both the audit trail and after-the-fact understanding of why the agent did what it did. And it imposes structure on the agent's behaviour, making it more predictable and testable than an agent that acts step by step with no declared intent. The pattern can be tuned to risk: high-stakes plans require human approval before execution; lower-stakes ones are checked automatically against policy; trivial ones proceed. The common thread is that visible intent is governable in a way that opaque, moment-to-moment action is not.

Designing the kill switch you will actually trust

Every agentic system needs a reliable way to stop it, and the phrase "kill switch" makes this sound simpler than it is. A kill switch is only worth having if it genuinely works under the conditions where you will need it — which are precisely the chaotic, high-pressure conditions of an incident. Several properties separate a real stop mechanism from a comforting illusion. It must be reliable: it actually halts the agent, promptly and completely, not "schedules a graceful shutdown" that an in-progress action outruns. It must be reachable: the accountable owner, and the incident responders, must have both the authority and the practical access to trigger it without a scramble for permissions. It must be understood: people must know it exists, where it is, and how to use it, before the crisis rather than during it. And its effects must be safe: stopping the agent mid-task should leave the system in a recoverable state, not a corrupted one. An untested kill switch is a hope, not a control; the only kill switch you can trust is one you have exercised in advance and know will work.

Agentic governance is the framework intensified

It bears repeating that agentic systems do not need a parallel governance universe; they need every discipline of this course applied with the intensity their autonomy demands. Their irreversibility and reduced oversight push them toward higher risk tiers. Their ownership must include someone who can actually stop them. Their validation leans heavily on behavioural testing across many scenarios, including adversarial ones, because single-prediction validation cannot capture multi-step behaviour. Their explainability is served partly by the plan-then-act pattern. Their monitoring must be more intensive, because their behaviour can drift in ways predictive monitoring would miss. And their security is sharper, because an agent with permissions is a target whose subversion hands an attacker real capabilities. In this sense, agentic AI is a stress test of the entire framework: it takes every control that mattered for predictive systems and makes it indispensable. An organisation that has built the disciplines of this course well is ready for agents; one that has treated those disciplines as optional will find agentic systems unforgiving.


In the next part: tooling, permissions, and blast-radius containment — the concrete mechanisms that enforce an agent's boundary through least-privilege access rather than trust.


← Previous lesson  ·  Next lesson →