L1 vs L2 Loss

loss as a function of error

— L2: e²

— L1: |e|

hover to compare values

regression: drag points to add outliers

drag data points

— L2 (MSE)

— L1 (MAE)

L2 slope = ·

L1 slope = ·

gradient: how each loss reacts to errors

— L2: 2e

— L1: sign(e)

L2 gradient grows with error — L1 gradient is constant

huber loss: the compromise

— L2

— L1

— Huber

δ 1.0

quadratic below δ, linear above

signal fitting: same network, different loss

— true signal

— L2 network

— L1 network

— L2 MSE

— L1 MSE

epoch 0 / 3000