The question
An SDF sampled on a 24³ grid is a 13,824-dimensional vector. But the shapes living in it are far simpler: a sphere is one number (radius), a box is three, a torus is two. PCA is the easiest tool for probing that gap: how many linear components of the SDF does it take to reconstruct the shape?
Six panels: ground truth on the left, then reconstructions at 2 PCs, 90%, 99%, 99.9%, and 99.99% variance retention. The cliff between 99% and 99.9% is where sharp surface details live: corners, exact curvature. Click through shapes or hit regenerate for a fresh dataset.
Explained variance
The eigenvalue spectrum shows how variance is distributed across principal components. The first few components capture the dominant shape modes (overall size, elongation), while later components encode finer details. The white cumulative curve shows how quickly total variance is recovered: 99% of the variance typically needs only 8-12 of the ~40 available components.
The latent space
Project each shape onto the first two principal components and the categories separate on their own. Spheres land in one neighborhood, boxes in another. No labels were used. PCA found this structure from the raw SDF vectors alone.
This is the core insight behind learned shape spaces: if linear PCA already separates categories this cleanly, a nonlinear method (autoencoders, neural implicits) can build far richer representations in the same dimensionality.
The math
The sections above showed what PCA produces. Now let's build the algorithm from scratch, starting with a 2D scatter plot and working up to the 13,824-dimensional SDF case. No linear algebra prerequisites needed.
Spread and direction
Imagine a cloud of 2D points. Some directions through the cloud have a lot of spread (points are far apart), and others have very little (points are bunched together). If you had to summarize the cloud with a single arrow, you would point it in the direction where the data varies the most. That arrow is the first principal component (PC1).
The second principal component (PC2) is the direction with the most remaining spread, perpendicular to the first. In 2D there are only two directions to find. In higher dimensions you keep going: each new component captures the most variance not yet accounted for, always perpendicular to all previous ones. The maximum number of components you can extract is $\min(\textcolor{#8CB4D5}{M}, D)$, the smaller of the number of samples and the number of dimensions. You can never find more independent directions than you have data points (each new point adds at most one new direction to the span), and you can never find more than the dimensionality of the space itself. For the SDF dataset, $\textcolor{#8CB4D5}{M = 48}$ shapes in $D = 13{,}824$ dimensions, so at most 48 components exist. In practice far fewer carry meaningful variance.
Click "project onto PC1" to collapse every point onto the PC1 line. The cyan dots are the projected positions: 2D data compressed to 1D. The dashed lines are what gets lost. PCA chooses the direction that makes those dashed lines as short as possible. That is the whole idea: find the line that preserves the most structure.
Centering the data
Before finding directions, shift the entire cloud so its center sits at the origin. This is done by computing the mean and subtracting it from every point:
$\textcolor{#7F77DD}{\boldsymbol{\mu}} = \frac{1}{\textcolor{#8CB4D5}{M}}\sum_{i=1}^{\textcolor{#8CB4D5}{M}} \mathbf{x}_i$
| $\textcolor{#7F77DD}{\boldsymbol{\mu}}$ | The mean (center of mass). For SDFs, this is the "average shape," computed voxel by voxel. |
| $\textcolor{#8CB4D5}{M}$ | Number of data points (or shapes). In the SDF dataset, $\textcolor{#8CB4D5}{M = 48}$. |
| $\mathbf{x}_i$ | A single data point. For SDFs, one flattened grid of 13,824 values. |
Why center? PCA finds directions of maximum spread from a reference point. If the data is off-center, the strongest "direction" would just point from the origin toward the cloud, which tells you nothing about the cloud's shape. Centering removes that offset so every direction PCA finds reflects genuine variation in the data.
In SDF terms, the mean shape is a blurry average of all 48 shapes in the dataset. Centering makes PCA discover how each shape differs from average, not where shapes happen to sit in value-space.
Variance and the covariance matrix
Variance ($\textcolor{#E8725C}{\sigma^2}$) measures how spread out values are along a single axis. Take the distance of each value from the mean, square it, and average:
$\textcolor{#E8725C}{\sigma_x^2} = \text{var}(x) = \frac{1}{\textcolor{#8CB4D5}{M}}\sum_{i=1}^{\textcolor{#8CB4D5}{M}}(x_i - \textcolor{#7F77DD}{\mu_x})^2$
Squaring makes all distances positive and penalizes far-away points more. High variance means the data is spread out; low variance means it is clustered near the mean.
Covariance ($\textcolor{#D36BE0}{\sigma_{xy}}$) measures how much two axes vary together. Replace the squared single-axis term with a product across both axes:
$\textcolor{#D36BE0}{\sigma_{xy}} = \text{cov}(x, y) = \frac{1}{\textcolor{#8CB4D5}{M}}\sum_{i=1}^{\textcolor{#8CB4D5}{M}}(x_i - \textcolor{#7F77DD}{\mu_x})(y_i - \textcolor{#7F77DD}{\mu_y})$
Positive covariance means x and y tend to move together. Negative means opposite directions. Near zero means independent.
The covariance matrix $\textcolor{#C9A227}{\mathbf{C}}$ packages these into one object: variances on the diagonal, covariances off-diagonal:
$\textcolor{#C9A227}{\mathbf{C}} = \begin{bmatrix} \textcolor{#E8725C}{\sigma_x^2} & \textcolor{#D36BE0}{\sigma_{xy}} \\\\ \textcolor{#D36BE0}{\sigma_{xy}} & \textcolor{#E8725C}{\sigma_y^2} \end{bmatrix}$
For a toy example, suppose we have four 2D points (already centered): $(-3, -2)$, $(1, 2)$, $(3, 0)$, $(-1, 0)$. Stack them as rows of a matrix $\textcolor{#E07B9D}{\tilde{\mathbf{X}}}$:
$\textcolor{#E07B9D}{\tilde{\mathbf{X}}} = \begin{bmatrix} -3 & -2 \\\\ 1 & 2 \\\\ 3 & 0 \\\\ -1 & 0 \end{bmatrix}$
The covariance matrix is $\textcolor{#C9A227}{\mathbf{C}} = \frac{1}{\textcolor{#8CB4D5}{M}}\textcolor{#E07B9D}{\tilde{\mathbf{X}}}^\top \textcolor{#E07B9D}{\tilde{\mathbf{X}}}$. Multiply the transpose by the original and divide by 4:
$\textcolor{#C9A227}{\mathbf{C}} = \frac{1}{\textcolor{#8CB4D5}{4}}\begin{bmatrix} -3 & 1 & 3 & -1 \\\\ -2 & 2 & 0 & 0 \end{bmatrix} \begin{bmatrix} -3 & -2 \\\\ 1 & 2 \\\\ 3 & 0 \\\\ -1 & 0 \end{bmatrix} = \frac{1}{\textcolor{#8CB4D5}{4}}\begin{bmatrix} 20 & 8 \\\\ 8 & 8 \end{bmatrix} = \begin{bmatrix} \textcolor{#E8725C}{5} & \textcolor{#D36BE0}{2} \\\\ \textcolor{#D36BE0}{2} & \textcolor{#E8725C}{2} \end{bmatrix}$
Variance along x is $\textcolor{#E8725C}{5}$ (top-left), variance along y is $\textcolor{#E8725C}{2}$ (bottom-right), and the covariance is $\textcolor{#D36BE0}{2}$ (off-diagonal). The x-axis has more spread than the y-axis, and the positive covariance tells you: when x is large, y tends to be large too.
| $\textcolor{#E07B9D}{\tilde{\mathbf{X}}}$ | The centered data matrix. Each row is one data point with the mean subtracted. Size $\textcolor{#8CB4D5}{M} \times D$ (samples by dimensions). |
| $\textcolor{#C9A227}{\mathbf{C}}$ | Covariance matrix. Size $D \times D$. Diagonal: variances ($\textcolor{#E8725C}{\sigma^2}$). Off-diagonal: covariances ($\textcolor{#D36BE0}{\sigma_{xy}}$). |
Here is the key property: if you pick any unit-length direction $\mathbf{u}$ and project the data onto it, the variance of the projected values is $\mathbf{u}^\top \textcolor{#C9A227}{\mathbf{C}} \mathbf{u}$. PCA asks: which $\mathbf{u}$ maximizes this? The answer turns out to be an eigenvector of $\textcolor{#C9A227}{\mathbf{C}}$.
Why? Because $\textcolor{#C9A227}{\mathbf{C}}$ is symmetric ($C_{ij} = C_{ji}$, the covariance of x with y equals the covariance of y with x). Symmetric matrices have a special guarantee: their eigenvectors are perpendicular to each other, and the eigenvalues are real numbers. That means the eigenvectors form a clean set of axes, and the eigenvalue of each axis is the variance along that axis. The direction with the largest eigenvalue has the most variance, which is exactly what PCA is looking for.
Eigenvectors: the directions that matter
A matrix transforms vectors: it stretches them, rotates them, or both. Most vectors get changed in complicated ways. But certain special directions only get stretched (or shrunk), with no rotation at all. These are the eigenvectors, and the stretch factor for each one is its eigenvalue:
$\textcolor{#C9A227}{\mathbf{C}}\mathbf{u} = \lambda \mathbf{u}$
Read this as: "when I apply $\textcolor{#C9A227}{\mathbf{C}}$ to the vector $\mathbf{u}$, the result points in the same direction as $\mathbf{u}$, just scaled by $\lambda$." Most vectors would get rotated. Eigenvectors are the special ones that don't.
Three vectors start on the unit circle: u (white, drag to rotate), λ₁ (the first eigenvector), and λ₂ (the second). Slide the transform from 0% to 100% to apply $\textcolor{#C9A227}{\mathbf{C}}$. Watch what happens: λ₁ stretches to 6x along its line, λ₂ stays put ($\textcolor{#1D9E75}{\lambda_2 = 1}$, so the matrix does not change it at all), but u gets knocked off its original direction. The red dashed line shows how far it deflected. Only eigenvectors survive the matrix without rotating.
Continuing the toy example, the covariance matrix was $\textcolor{#C9A227}{\mathbf{C}} = \begin{bmatrix} \textcolor{#E8725C}{5} & \textcolor{#D36BE0}{2} \\\\ \textcolor{#D36BE0}{2} & \textcolor{#E8725C}{2} \end{bmatrix}$. Try the direction $\textcolor{#D85A30}{\mathbf{u}_1} = \frac{1}{\sqrt{5}}(2, 1)$:
$\begin{bmatrix} 5 & 2 \\\\ 2 & 2 \end{bmatrix} \textcolor{#D85A30}{\begin{bmatrix} 0.894 \\\\ 0.447 \end{bmatrix}} = \begin{bmatrix} 5.36 \\\\ 2.68 \end{bmatrix} = \textcolor{#D85A30}{6} \textcolor{#D85A30}{\begin{bmatrix} 0.894 \\\\ 0.447 \end{bmatrix}}$
The output points in the same direction as the input, scaled by $\textcolor{#D85A30}{6}$. So $(2, 1) / \sqrt{5}$ is an eigenvector with eigenvalue $\textcolor{#D85A30}{\lambda_1 = 6}$. This is PC1: the direction where the data has the most spread. It tilts toward the x-axis because x has more variance than y.
The perpendicular direction $\textcolor{#1D9E75}{\mathbf{u}_2} = \frac{1}{\sqrt{5}}(1, -2)$ is the other eigenvector, with eigenvalue $\textcolor{#1D9E75}{\lambda_2 = 1}$. Much less variance. That is PC2.
| $\mathbf{u}$ | Eigenvector. A direction that $\textcolor{#C9A227}{\mathbf{C}}$ stretches without rotating. In PCA, this is a principal component. |
| $\lambda$ | Eigenvalue. How much $\textcolor{#C9A227}{\mathbf{C}}$ stretches in that direction. In PCA, this equals the variance of the data along $\mathbf{u}$. |
For the covariance matrix specifically, each eigenvector is a direction of independent spread, and its eigenvalue tells you how much spread there is. Sort them by eigenvalue (largest first) and you get the principal components in order of importance:
- PC1: largest eigenvalue ($\textcolor{#D85A30}{\lambda_1 = 6}$), direction of maximum variance.
- PC2: second largest ($\textcolor{#1D9E75}{\lambda_2 = 1}$), maximum remaining variance, perpendicular to PC1.
- PC3, PC4, ...: each captures the next largest chunk of remaining variance, always perpendicular to all previous.
In the 2D demo above, the data has two eigenvectors (the two arrows). The longer arrow is PC1 because the data is more spread out along it. In 13,824 dimensions you could have up to 13,824 eigenvectors, but only the first handful carry meaningful variance. The rest are noise.
The dual trick
The covariance matrix $\textcolor{#C9A227}{\mathbf{C}} = \frac{1}{\textcolor{#8CB4D5}{M}}\textcolor{#E07B9D}{\tilde{\mathbf{X}}}^\top \textcolor{#E07B9D}{\tilde{\mathbf{X}}}$ has size $D \times D$. For the SDF dataset, that is $13{,}824 \times 13{,}824$, roughly 191 million entries. Eigendecomposing it directly would be very expensive.
But notice: we only have $\textcolor{#8CB4D5}{M = 48}$ shapes. 48 data points in a 13,824-dimensional space can only span a 48-dimensional subspace at most. That means at most 48 eigenvectors can have nonzero eigenvalues. The other 13,776 eigenvectors all have eigenvalue zero (no variance in those directions, because there is no data there). We are doing a huge amount of work to find directions that are mostly empty.
The dual trick flips the multiplication order. Instead of $\textcolor{#E07B9D}{\tilde{\mathbf{X}}}^\top \textcolor{#E07B9D}{\tilde{\mathbf{X}}}$ (big, $D \times D$), compute $\textcolor{#E07B9D}{\tilde{\mathbf{X}}}\textcolor{#E07B9D}{\tilde{\mathbf{X}}}^\top$ (small, $\textcolor{#8CB4D5}{M} \times \textcolor{#8CB4D5}{M}$):
$\textcolor{#C9A227}{\mathbf{G}} = \textcolor{#E07B9D}{\tilde{\mathbf{X}}}\textcolor{#E07B9D}{\tilde{\mathbf{X}}}^\top \quad (\textcolor{#8CB4D5}{48} \times \textcolor{#8CB4D5}{48})$
This is the Gram matrix. Entry $(i, j)$ is the dot product between centered shape $i$ and centered shape $j$. It measures how similar two shapes are (after removing the mean).
Using the toy example: $\textcolor{#E07B9D}{\tilde{\mathbf{X}}}$ was $\textcolor{#8CB4D5}{4} \times 2$, so $\textcolor{#C9A227}{\mathbf{C}} = \frac{1}{\textcolor{#8CB4D5}{4}}\textcolor{#E07B9D}{\tilde{\mathbf{X}}}^\top \textcolor{#E07B9D}{\tilde{\mathbf{X}}}$ is $2 \times 2$ and $\textcolor{#C9A227}{\mathbf{G}} = \textcolor{#E07B9D}{\tilde{\mathbf{X}}}\textcolor{#E07B9D}{\tilde{\mathbf{X}}}^\top$ is $\textcolor{#8CB4D5}{4} \times \textcolor{#8CB4D5}{4}$. In that case $\textcolor{#C9A227}{\mathbf{C}}$ is already small, so the trick is unnecessary. But when $D = 13{,}824$ and $\textcolor{#8CB4D5}{M = 48}$, it replaces a $13{,}824 \times 13{,}824$ eigendecomposition with a $\textcolor{#8CB4D5}{48} \times \textcolor{#8CB4D5}{48}$ one.
The key mathematical fact: $\textcolor{#C9A227}{\mathbf{G}}$ and $\textcolor{#C9A227}{\mathbf{C}}$ share the same nonzero eigenvalues (up to a factor of $\textcolor{#8CB4D5}{M}$). And you can recover the principal directions from the Gram eigenvectors by multiplying back by $\textcolor{#E07B9D}{\tilde{\mathbf{X}}}^\top$:
$\mathbf{u}_k = \frac{1}{\sqrt{\lambda_k}}\textcolor{#E07B9D}{\tilde{\mathbf{X}}}^\top \mathbf{v}_k$
| $\textcolor{#C9A227}{\mathbf{G}}$ | Gram matrix ($\textcolor{#8CB4D5}{M} \times \textcolor{#8CB4D5}{M}$). Entry $(i, j)$ is the dot product between centered shapes $i$ and $j$. |
| $\mathbf{v}_k$ | The $k$-th eigenvector of $\textcolor{#C9A227}{\mathbf{G}}$ (a 48-dimensional vector: one weight per shape). |
| $\lambda_k$ | The $k$-th eigenvalue. Same nonzero eigenvalues as the full covariance matrix. |
| $\mathbf{u}_k$ | The $k$-th principal direction, recovered in the original 13,824-dimensional space. |
The intuition: $\mathbf{v}_k$ tells you how to weight the shapes and $\textcolor{#E07B9D}{\tilde{\mathbf{X}}}^\top \mathbf{v}_k$ takes that weighted combination of shapes and produces the actual direction in voxel-space. The $1/\sqrt{\lambda_k}$ just normalizes it to unit length.
This reduces an $O(D^3)$ problem to $O(\textcolor{#8CB4D5}{M}^3)$. For $D = 13{,}824$ and $\textcolor{#8CB4D5}{M = 48}$, that is roughly $2.6 \times 10^{12}$ operations down to about $110{,}000$.
Projection and reconstruction
To compress a shape, project its centered vector onto each principal direction. The dot product gives a single scalar: how far along that direction the point sits.
$\textcolor{#18B8C8}{c_k} = \langle \mathbf{x} - \textcolor{#7F77DD}{\boldsymbol{\mu}},~ \mathbf{u}_k \rangle$
The dot product $\langle \mathbf{a}, \mathbf{b} \rangle$ multiplies corresponding entries and sums: $a_1 b_1 + a_2 b_2 + \cdots$. The result is a single number. Positive means $\mathbf{a}$ points roughly the same way as $\mathbf{b}$, negative means roughly opposite, zero means perpendicular.
Continuing the toy example: the point $(3, 0)$ is already centered (the mean was zero). Project it onto PC1 $= \frac{1}{\sqrt{5}}(2, 1)$:
$\textcolor{#D85A30}{c_1} = (3)(0.894) + (0)(0.447) = \textcolor{#D85A30}{2.68}$
Project onto PC2 $= \frac{1}{\sqrt{5}}(1, -2)$:
$\textcolor{#1D9E75}{c_2} = (3)(0.447) + (0)(-0.894) = \textcolor{#1D9E75}{1.34}$
The point $(3, 0)$ is now represented as two coefficients: $\textcolor{#D85A30}{c_1 = 2.68}$ along PC1 and $\textcolor{#1D9E75}{c_2 = 1.34}$ along PC2. These are its coordinates in the principal component basis.
To reconstruct, multiply each coefficient by its direction and add back the mean:
$\textcolor{#18B8C8}{\hat{\mathbf{x}}} = \textcolor{#7F77DD}{\boldsymbol{\mu}} + \sum_{k=1}^{K} \textcolor{#18B8C8}{c_k} \mathbf{u}_k$
With both components ($K = 2$), the reconstruction is exact: $\textcolor{#D85A30}{2.68} \times (0.894, 0.447) + \textcolor{#1D9E75}{1.34} \times (0.447, -0.894) = (3, 0)$. With only PC1 ($K = 1$), you get $\textcolor{#D85A30}{2.68} \times (0.894, 0.447) = \textcolor{#18B8C8}{(2.40, 1.20)}$. Close, but not exact. The error $(0.60, -1.20)$ is exactly the PC2 component we dropped.
| $\textcolor{#18B8C8}{c_k}$ | Coefficient for the $k$-th component. A single scalar encoding "how much of direction $k$ is in this shape." |
| $K$ | Number of components kept. More components, better reconstruction, larger representation. |
| $\textcolor{#18B8C8}{\hat{\mathbf{x}}}$ | The reconstructed shape. An approximation of $\mathbf{x}$ using only $K$ numbers plus the shared mean and directions. |
This is exactly what the six-panel viewer at the top computes. At $K = 2$, only two scalars encode the entire 13,824-dimensional SDF. At 99.9% variance, enough components are included to recover sharp edges. And this is what the 2D demo shows in miniature: projecting points onto PC1 is the $K = 1$ version of this formula. The dashed residual lines are the PC2 (and higher) components being thrown away.