Quadratic Discriminant Analysis

Like LDA but allows each class its own covariance matrix โ€” giving quadratic rather than linear decision boundaries.

QDA decision boundary with class-specific covariance shapes
Sigma0Sigma1curved boundary

Unlike LDA, QDA lets each class have its own covariance matrix, so the quadratic terms do not cancel.

Definition

Quadratic Discriminant Analysis (QDA) is a generative classifier similar to LDA, but each class has its own covariance matrix ฮฃk\Sigma_k instead of a shared one.

Model: class kk follows N(ฮผk,ฮฃk)\mathcal{N}(\boldsymbol{\mu}_k, \Sigma_k) with prior ฯ€k\pi_k.

Because the quadratic terms in x\mathbf{x} no longer cancel (different ฮฃk\Sigma_k per class), the decision boundary between classes is quadratic (a conic section in 2D โ€” ellipse, parabola, or hyperbola).

This gives QDA more flexibility than LDA โ€” it can model non-spherical, differently-shaped class regions.

Key properties
  • Decision boundaries are always conic sections โ€” ellipses, parabolas, or hyperbolas in 2D
  • Reduces exactly to LDA when all class covariances happen to be equal
  • More flexible than LDA, at the cost of estimating far more parameters
  • A generative model: it models the full class-conditional distribution, not just the boundary
Common mistakes
  • Using QDA with too little data per class: each class needs enough samples to reliably estimate its own dร—dd\times d covariance matrix, or the estimates become unstable/singular
  • Assuming QDA is always better than LDA because it's more flexible: extra flexibility costs variance โ€” with limited data, LDA's shared-covariance bias can outperform QDA's lower-bias-but-higher-variance estimate
LDA vs QDA

Two classes: Class 0 is a round blob, Class 1 is a thin diagonal streak. LDA (shared covariance) fits an average oval and draws a line โ€” poor fit. QDA models each class separately โ€” Class 0 gets a circular Gaussian, Class 1 gets an elongated one. The curved boundary separates them much better.

Try it

When might you prefer LDA over QDA even if class covariances differ? Think about sample size.

Solution

QDA estimates KK covariance matrices, each of size dร—dd \times d: total O(Kd2)O(Kd^2) parameters for covariances alone. LDA estimates just one: O(d2)O(d^2).

With small training sets, QDA's many parameters will overfit. LDA shares data across classes to estimate a single, more reliable covariance. The bias from the equal-covariance assumption may be worth the reduction in variance.

Rule of thumb: use LDA when n/pn/p is small (few examples per dimension per class); QDA when you have ample data and class shapes clearly differ.

Related concepts