No aspect of regulated AI draws more scrutiny — from regulators, courts, journalists, and the public — than fairness. When an AI system treats people unequally on the basis of who they are, it does more than make errors; it inflicts a particular kind of harm that the law and society treat with special gravity. Yet fairness is also the most conceptually slippery topic in the field, because it is not one thing, cannot be fully satisfied, and forces genuine value choices that no algorithm can make for you. This part gives you the conceptual and practical equipment to take fairness seriously without pretending it is simpler than it is.
Where bias comes from
The first myth to dispel is that bias is primarily about prejudiced programmers. Almost always, bias enters through data and design choices, not malice — which is why it is so pervasive and so easy to miss. The main sources:
- Historical bias. The data reflects a world that was already unequal. A model trained on past lending decisions learns the discrimination embedded in them and reproduces it, faithfully and at scale. The model is not wrong about the data; the data is a record of injustice.
- Representation bias. Some groups are under-represented in the training data, so the model learns them less well and serves them worse — often invisibly, because aggregate accuracy looks fine while performance for a minority is poor.
- Measurement bias. The features or labels mean different things for different groups, or are measured with different accuracy across them, so the model's inputs are themselves skewed.
- Proxy variables. Even when you exclude a protected attribute, other features correlate with it — postcode with ethnicity, purchase history with gender — and the model reconstructs the protected attribute from its proxies. "We didn't use race" is no defence when the model inferred it anyway.
- Aggregation bias. A single model applied to groups that genuinely differ may serve none of them well, when separate treatment would have served all of them better.
You can build a biased system without a single biased intention. The bias was in the data, the sampling, and the proxies — the model simply found it.
Fairness has no single definition
The deepest difficulty is that fairness is not mathematically unique. Researchers have formalised many distinct, intuitively reasonable definitions, and a famous and unsettling result is that several of them are mutually incompatible — you cannot satisfy them all at once except in trivial cases. A few of the major notions:
- Group parity — outcomes (such as approval rates) should be equal across groups.
- Equal error rates — the model should be wrong equally often, in the same ways, for each group.
- Calibration — a given score should mean the same thing regardless of group.
- Individual fairness — similar individuals should be treated similarly, regardless of group.
These pull in different directions. Improving equality of one kind can worsen another. This is not a flaw to be engineered away; it is a reflection of the fact that fairness encodes value judgements about which reasonable people, and different legal traditions, disagree. The practical consequence is profound: you must choose, explicitly and with justification, which notion of fairness your system pursues, because you cannot have them all, and pretending otherwise just means the choice gets made implicitly and undefended.
Measuring fairness
Whatever definition you adopt, fairness must be measured, not assumed. The core practice is disparate-impact testing: examining whether the system's outcomes and error rates differ across protected groups, and by how much. This requires care.
- Decide what to measure. Outcomes, error rates, calibration — driven by the fairness notion you have chosen and the obligations you face.
- Obtain the data to measure it. Testing for disparate impact across groups requires knowing group membership, which collides with the instinct (and sometimes the rule) not to collect protected attributes. This tension is real and must be navigated deliberately, sometimes through carefully governed special handling of protected data solely for fairness testing.
- Test for proxies. Excluding a protected attribute is not enough; you must check whether the model reconstructs it from correlated features, and measure impact on that basis.
- Test continuously. Fairness is not a one-time pre-launch check. A model fair at launch can drift into unfairness as data shifts, so disparate-impact testing belongs in ongoing monitoring.
Mitigating bias
When testing reveals unfairness, mitigation techniques operate at three stages, and the choice among them carries trade-offs:
- Pre-processing — adjusting the training data to reduce bias before the model sees it, for instance by re-balancing under-represented groups or correcting skewed labels.
- In-processing — building fairness constraints directly into the model's training objective, so it optimises for accuracy and fairness together.
- Post-processing — adjusting the model's outputs after the fact to equalise outcomes or error rates across groups.
Each has costs. Mitigation often trades some aggregate accuracy for greater fairness — a trade-off that is itself a value choice requiring documented justification. Some techniques raise their own legal questions, because adjusting outcomes by group can edge toward the very group-based treatment anti-discrimination law restricts. There is genuine tension here between different legal principles, and resolving it requires legal input, not just technical skill.
Fairness is socio-technical, not just technical
The most important lesson is that fairness cannot be fully solved with mathematics, because at its core it is about values and context, not just code. The choice of fairness definition, the acceptable trade-off against accuracy, the question of which groups warrant protection and in what way — these are decisions for the organisation, informed by law, ethics, and the specific context of use, made by accountable humans rather than delegated to an optimisation routine. The technical tools are necessary but not sufficient. A team that treats fairness as a metric to maximise has misunderstood the problem; a team that treats it as a deliberate, documented, accountable set of value choices — supported by rigorous measurement — has grasped it.
Documenting fairness decisions
Because fairness involves contestable choices, documentation is doubly important. For each high-risk system you should be able to show which fairness notion you adopted and why, what disparate-impact testing you performed and what it found, what mitigations you applied and at what cost, and who made the value calls along the way. This record is your defence when — not if — the system's fairness is questioned. A system whose fairness choices are explicit, reasoned, and evidenced is defensible even where a critic would have chosen differently; one whose fairness was never deliberately considered is indefensible the moment disparity is found.
The impossibility result, made concrete
The claim that fairness definitions can be mutually incompatible sounds abstract until you see it bite. Consider a model producing risk scores for two groups whose underlying base rates genuinely differ — not because of anything about the groups themselves, but because of historical conditions the data reflects. You might want the model to be calibrated: a score of "high risk" should mean the same actual risk regardless of group. You might also want equal error rates: the model should wrongly flag people from each group at the same rate. It turns out that, when base rates differ, you generally cannot have both at once — improving one degrades the other. This is not a failure of cleverness that a better algorithm will fix; it is a mathematical fact about what these definitions demand.
The lesson is profound and liberating: there is no escape into pure technique. Because you cannot satisfy every reasonable notion of fairness simultaneously, you must choose which to prioritise, and that choice is a value judgement informed by law, ethics, and context — not something an optimiser can settle. A team that does not choose explicitly has still chosen, implicitly and undefended, whichever notion its tooling happened to optimise. Making the choice deliberate, reasoned, and documented is the difference between a defensible fairness posture and an accidental one.
You cannot be fair in every sense at once. The mature question is not "is it fair?" but "which fairness did we choose, why, and at what cost?"
The protected-attribute paradox
Fairness work runs into a genuine paradox that teams must navigate rather than wish away. To test whether a system treats groups equitably, you need to know group membership — yet collecting protected attributes like ethnicity or gender sits uneasily with both the instinct to avoid such data and, sometimes, the rules around it. You cannot measure disparate impact across a group whose membership you refuse to record. The resolution is usually not to avoid the data but to handle it deliberately: collecting or inferring protected attributes under careful governance, using them solely for fairness testing rather than for decisions, and protecting them with particular rigour. This is delicate territory that needs legal input, because the line between "using protected data to test for fairness" and "using protected data to make decisions" is exactly the line anti-discrimination law polices. But refusing to engage — declining to collect the data and therefore being unable to detect unfairness — is not a neutral choice; it is choosing not to know, which is rarely defensible when the system makes consequential decisions about people.
Proxies and the limits of "we didn't use it"
A defence teams reach for instinctively is that they excluded protected attributes from the model — "we didn't use race." Earlier we noted this is no defence when proxies reconstruct the attribute; it is worth dwelling on why, because the instinct is so strong. In rich datasets, protected characteristics are frequently encoded, often densely, in seemingly neutral features: location correlates with ethnicity, purchase and browsing patterns with gender, name and language with national origin. A capable model, asked to predict an outcome that historically correlated with a protected attribute, will reconstruct that attribute from its proxies and use it — without ever seeing the attribute directly. Excluding the protected feature changes nothing if the model can infer it. This is why fairness must be tested on outcomes across groups, not assumed from inputs: the only way to know whether a system treats groups equitably is to measure whether it does, regardless of what features it nominally uses. A team that audits its feature list and declares victory has tested the wrong thing.
Mitigation as a documented trade-off
When testing reveals unfairness and you apply mitigation, you are almost always trading something — usually some aggregate accuracy, sometimes treading near the legal line around group-based adjustment. The mature practice is to treat each mitigation as a documented decision: what disparity was found, which mitigation was chosen, what it cost in other terms, what alternatives were considered, and who — an accountable human, informed by legal and ethical input — made the call. This record matters because fairness decisions are contestable by nature, and the difference between a defensible system and an indefensible one is rarely whether a critic would have chosen identically. It is whether the choice was deliberate, reasoned, evidenced, and owned, or whether fairness was never genuinely confronted at all. A system whose fairness trade-offs are explicit and justified can be defended even where reasonable people would draw the line differently; a system that never made the trade-offs consciously cannot be defended the moment disparity is found.
In the next part: human-in-the-loop design — how to place human judgement in the decision flow so that oversight is meaningful rather than a rubber stamp.
