MAP: L2 Regularization is Bayesian

the gaussian prior p(w)

loading python runtime...

σ 1.00

peak density = 0.40

likelihood p(D|w)

drag data points

loading python runtime...

noise σₙ 0.50

MLE ŵ = ·

posterior = prior × likelihood

— prior

— likelihood

— posterior

MAP·

MLE·

loading python runtime...

prior σ 1.00

noise σₙ 0.50

the same optimization, two views

regularization view: ∑(yᵢ − wxᵢ)² + λw²

bayesian view: −log p(w|D)

loading python runtime...

λ 1.00

σ = 0.71

λ = σₙ²/σ²

min at ŵ = ·

playground: MLE vs MAP polynomial fit

drag points to refit

— MLE

— MAP (ridge)

loading python runtime...

degree 6

λ 0.50