Quadratic Discriminant Analysis

Like LDA but allows each class its own covariance matrix — giving quadratic rather than linear decision boundaries.

QDA lets each class have its own covariance shape

Class 0

Compact, tilted covariance ellipse.

Class 1

Wider covariance ellipse with a different orientation.

Boundary

Different covariance matrices leave a conic-section decision boundary.

Definition

Quadratic Discriminant Analysis (QDA) is a generative classifier similar to LDA, but each class has its own covariance matrix $\Sigma_k$ instead of a shared one.

Model: class $k$ follows $\mathcal{N}(\boldsymbol{\mu}_k, \Sigma_k)$ with prior $\pi_k$ .

Because the quadratic terms in $\mathbf{x}$ no longer cancel (different $\Sigma_k$ per class), the decision boundary between classes is quadratic (a conic section in 2D — ellipse, parabola, or hyperbola).

This gives QDA more flexibility than LDA — it can model non-spherical, differently-shaped class regions.

Key properties

Decision boundaries are always conic sections — ellipses, parabolas, or hyperbolas in 2D
Reduces exactly to LDA when all class covariances happen to be equal
More flexible than LDA, at the cost of estimating far more parameters
A generative model: it models the full class-conditional distribution, not just the boundary

Common mistakes

Using QDA with too little data per class: each class needs enough samples to reliably estimate its own $d\times d$ covariance matrix, or the estimates become unstable/singular
Assuming QDA is always better than LDA because it's more flexible: extra flexibility costs variance — with limited data, LDA's shared-covariance bias can outperform QDA's lower-bias-but-higher-variance estimate

LDA vs QDA

Two classes: Class 0 is a round blob, Class 1 is a thin diagonal streak. LDA (shared covariance) fits an average oval and draws a line — poor fit. QDA models each class separately — Class 0 gets a circular Gaussian, Class 1 gets an elongated one. The curved boundary separates them much better.

Try it

When might you prefer LDA over QDA even if class covariances differ? Think about sample size.

Solution

QDA estimates $K$ covariance matrices, each of size $d \times d$ : total $O(Kd^2)$ parameters for covariances alone. LDA estimates just one: $O(d^2)$ .

With small training sets, QDA's many parameters will overfit. LDA shares data across classes to estimate a single, more reliable covariance. The bias from the equal-covariance assumption may be worth the reduction in variance.

Rule of thumb: use LDA when $n/p$ is small (few examples per dimension per class); QDA when you have ample data and class shapes clearly differ.

Related concepts

Needs first

Linear Discriminant Analysis

Naive Bayes

View in full concept graph →