Probability Calibration

Whether a model's predicted probabilities mean what they say — and how to fix them with Platt scaling or isotonic regression when they don't.

A reliability diagram: points below the diagonal mean the model is overconfident
perfectly calibratedpredicted probabilityobserved frequency

Platt scaling (fit a sigmoid) or isotonic regression (fit a monotonic step function) can pull the red curve back onto the diagonal.

Definition

A classifier is calibrated if its predicted probabilities mean what they say: among all the times it predicts "70% chance of rain," it should actually rain about 70% of the time. A model can be highly accurate (correctly classifying most examples) while being badly calibrated (its probability numbers are misleading) — accuracy and calibration measure different things.

Overconfidence in practice

A spam filter that labels everything it flags "99% spam," but is only actually right 80% of the time on those flagged emails, is overconfident — it's a useful classifier but a poor probability estimator. Calibration is about fixing that gap.

Try it

Could a classifier be perfectly calibrated but have poor accuracy?

Solution

Yes. A model that always predicts "50% chance" for everything, where the true base rate genuinely is 50%, is perfectly calibrated — its stated probability matches the long-run frequency — but it provides zero discriminative power between classes, so its accuracy (as a hard classifier) is no better than a coin flip.

Related concepts