activation function explorer
x
·
f(x)
·
f'(x)
·
drag along the curve
Sigmoid
Tanh
ReLU
Leaky ReLU
GELU
Swish
classification: linear vs nonlinear network
linear
toggle between linear and nonlinear
Linear (no activation)
Nonlinear (ReLU)
Gaussian CDF Φ(x) and PDF φ(x)
x
·
Φ(x)
·
φ(x)
·
drag to explore
vanishing gradient: gradient magnitude by layer
— Sigmoid
— Tanh
— ReLU
— Leaky ReLU
— GELU
— Swish
layers
5
gradient at layer 1:
·
/
·
/
·
/
·
/
·
/
·