Mann-Whitney Test

A non-parametric test comparing two groups by ranking all observations — valid without normality assumptions.

Mann-Whitney rank strip — pool both groups, rank all observations

Group A rank sum

W_A = 17

Expected under H₀

E[W] = 30

Group B rank sum

W_B = 49

Under H₀ (groups identical), ranks are randomly distributed — each group's sum should be ≈ 30. Group B's higher rank sum suggests it tends to have larger values.

Definition

The Mann-Whitney U test (also called the Wilcoxon rank-sum test) is a nonparametric test that compares two independent groups without assuming normality.

Null hypothesis: the two populations have the same distribution (or, in the location-shift version, the same median).

Procedure:

Pool both groups and rank all observations (smallest = 1)
Compute $U_1 = n_1 n_2 + \frac{n_1(n_1+1)}{2} - R_1$ , where $R_1$ is the sum of ranks for group 1
$U_2 = n_1 n_2 - U_1$ (they sum to $n_1 n_2$ )
The test statistic is $U = \min(U_1, U_2)$ ; compare to the Mann-Whitney distribution (or use normal approximation for large $n$ )

Key properties

Uses only the rank order of observations, never their actual numeric values
Valid for any continuous distribution — no normality assumption required
$U_1$ and $U_2$ always sum to exactly $n_1 n_2$
Nearly as powerful as the t-test under normality, and often more powerful when data is skewed or heavy-tailed

Common mistakes

Interpreting a significant result as "the medians differ": strictly, the test detects stochastic dominance — if the two distributions differ in shape (not just location), "equal medians" isn't quite the right null hypothesis being tested
Ignoring ties: many tied values require a correction factor in the normal approximation; ignoring it can distort the p-value

Comparing response times

Group A (new UI): $\{2.1, 3.4, 1.9, 4.0\}$ , Group B (old UI): $\{3.8, 5.2, 4.7, 6.1\}$ .

Pool and rank: 1.9(A), 2.1(A), 3.4(A), 3.8(B), 4.0(A), 4.7(B), 5.2(B), 6.1(B).

$R_A = 1+2+3+5 = 11$ , $U_A = 4\cdot4 + 10 - 11 = 15$ , $U_B = 16 - 15 = 1$ .

$U = \min(15, 1) = 1$ . Small $U$ suggests Group A has generally smaller values (faster).

Try it

Why might you use the Mann-Whitney test instead of a two-sample t-test when comparing incomes of two professional groups?

Solution

Income distributions are typically right-skewed with heavy tails (a few very high earners). The two-sample t-test assumes (approximately) normal distributions, which fails badly with heavy-tailed skewed data — the test statistic doesn't follow the t-distribution, and the Type I error rate is inflated.

The Mann-Whitney test uses only the ranks of the observations, not their actual values. It's insensitive to outliers and skewness. The test is valid for any continuous distribution — no normality required.

Additionally, income data may have practical outliers (billionaires in the sample) that massively influence the mean but barely affect the rank ordering.

Related concepts

Needs first

Hypothesis Testing Sampling

Kruskal-Wallis Test T-Test

View in full concept graph →