Frisch–Waugh–Lovell Theorem

Theorem

Let the model be y = β₁X₁ + β₂X₂ + ε. Let ỹ and X̃₁ denote the residuals from regressing y and X₁ on X₂ respectively. Then there are three equivalent ways to obtain β̂₁:

1. β̂₁ = (X′X)⁻¹X′y — regress y on X₁ and X₂
2. β̂₁ = (X̃₁′X̃₁)⁻¹X̃₁′ỹ — regress ỹ on X̃₁
3. β̂₁ = (X̃₁′X̃₁)⁻¹X̃₁′y — regress y on X̃₁

Intuition

ỹ and X̃₁ are the parts of y and X₁ orthogonal to X₂ — the variation unexplained by the control. Regressing on these residuals isolates the pure partial effect of X₁. The raw univariate regression of y on X₁ alone conflates this with the X₁–X₂ correlation, producing omitted variable bias when X₂ also affects y. The bottom-right panel illustrates method 2; the sampling distributions below show the consequence for bias.

Raw Y vs X₁

Residual Y vs X₁

Y vs Residual X₁

★ FWL: Residual Y vs Residual X₁

OLS fit

95% CI

True slope

Sampling distribution of β̂₁ across 500 simulations at current parameters

★ Full model / FWL — unbiased

Raw univariate — biased