Mann-Whitney Test

A non-parametric test comparing two groups by ranking all observations โ€” valid without normality assumptions.

Hypothesis test โ€” rejection region and test statistic
z=1.96z=-1.96z=1.80-3-2-10123ฮฑ/2 = 0.025ฮฑ/2 = 0.025
โœ“ Fail to reject Hโ‚€ โ€” p-value โ‰ˆ 0.0719 > ฮฑ=0.05
ฮฑ = 0.05
z = 1.80
Click to toggle
Definition

The Mann-Whitney U test (also called the Wilcoxon rank-sum test) is a nonparametric test that compares two independent groups without assuming normality.

Null hypothesis: the two populations have the same distribution (or, in the location-shift version, the same median).

Procedure:

  1. Pool both groups and rank all observations (smallest = 1)
  2. Compute U1=n1n2+n1(n1+1)2โˆ’R1U_1 = n_1 n_2 + \frac{n_1(n_1+1)}{2} - R_1, where R1R_1 is the sum of ranks for group 1
  3. U2=n1n2โˆ’U1U_2 = n_1 n_2 - U_1 (they sum to n1n2n_1 n_2)
  4. The test statistic is U=minโก(U1,U2)U = \min(U_1, U_2); compare to the Mann-Whitney distribution (or use normal approximation for large nn)
Key properties
  • Uses only the rank order of observations, never their actual numeric values
  • Valid for any continuous distribution โ€” no normality assumption required
  • U1U_1 and U2U_2 always sum to exactly n1n2n_1 n_2
  • Nearly as powerful as the t-test under normality, and often more powerful when data is skewed or heavy-tailed
Common mistakes
  • Interpreting a significant result as "the medians differ": strictly, the test detects stochastic dominance โ€” if the two distributions differ in shape (not just location), "equal medians" isn't quite the right null hypothesis being tested
  • Ignoring ties: many tied values require a correction factor in the normal approximation; ignoring it can distort the p-value
Comparing response times

Group A (new UI): {2.1,3.4,1.9,4.0}\{2.1, 3.4, 1.9, 4.0\}, Group B (old UI): {3.8,5.2,4.7,6.1}\{3.8, 5.2, 4.7, 6.1\}.

Pool and rank: 1.9(A), 2.1(A), 3.4(A), 3.8(B), 4.0(A), 4.7(B), 5.2(B), 6.1(B).

RA=1+2+3+5=11R_A = 1+2+3+5 = 11, UA=4โ‹…4+10โˆ’11=15U_A = 4\cdot4 + 10 - 11 = 15, UB=16โˆ’15=1U_B = 16 - 15 = 1.

U=minโก(15,1)=1U = \min(15, 1) = 1. Small UU suggests Group A has generally smaller values (faster).

Try it

Why might you use the Mann-Whitney test instead of a two-sample t-test when comparing incomes of two professional groups?

Solution

Income distributions are typically right-skewed with heavy tails (a few very high earners). The two-sample t-test assumes (approximately) normal distributions, which fails badly with heavy-tailed skewed data โ€” the test statistic doesn't follow the t-distribution, and the Type I error rate is inflated.

The Mann-Whitney test uses only the ranks of the observations, not their actual values. It's insensitive to outliers and skewness. The test is valid for any continuous distribution โ€” no normality required.

Additionally, income data may have practical outliers (billionaires in the sample) that massively influence the mean but barely affect the rank ordering.

Related concepts