DoMINO (Decomposable Multi-scale Iterative Neural Operator) is a point cloud-based neural operator by NVIDIA Research that predicts CFD-quality flow fields directly from surface geometry. Given an STL mesh, it outputs both volume fields (pressure, velocity, turbulent viscosity) and surface fields (pressure, wall shear stress) at real-time speed, replacing hours of RANS simulation.
I built an interactive explorer that lets you walk through every layer of the architecture in 3D. This post explains the pipeline; click any "Explore" button to open the visualizer focused on that layer.
Input: surface geometry
Everything starts with a triangulated surface mesh (STL) of the vehicle. The original CFD meshes contain ~10 million surface elements per case; DoMINO samples ~100K surface points from these as input. Crucially, mesh connectivity is discarded: only the raw xyz coordinates are kept. This makes the model mesh-independent. As the paper notes, predictions are validated on uniformly sampled point clouds rather than the simulation mesh, and the model is "not dependent on how these points are sampled."
Coordinates are normalized to the computational domain bounding box (12m x 4.5m x 3.3m), which extends well beyond the car into the wake region. This normalization ensures that the point convolution kernel radii have consistent physical meaning across different car placements.
The same surface mesh is also used to compute a signed distance field (SDF) on the volume grid: the shortest signed distance from every voxel center to the car surface. The SDF and its gradient components are appended to the learned geometry features to provide explicit topology information.
One of the cleverest details: the SDF gradient (∇SDF) at the surface approximates the outward-pointing normal vector, without needing normals defined in the input at all. This is a big deal in practice because surface normals can be unreliable when transferring geometry across file formats or recomputing on non-watertight meshes. The SDF sidesteps this entirely.
Multi-scale ball queries
This is where the architecture gets interesting. DoMINO uses point convolution kernels implemented with GPU-accelerated ball query layers (via NVIDIA Warp). For each cell in a 3D volume grid, the kernel searches for surface vertices within a specified radius and operates on the resulting unstructured neighborhood.
The key insight is doing this at multiple scales, using a range of kernel sizes to capture both finer geometry features and long-range interactions:
| Scale | What it captures |
|---|---|
| Fine | Panel edges, mirror geometry, wheel spokes |
| Medium | Fender curvature, windshield angle |
| Coarse | Overall body sections, wheelbase |
| Global | Full car silhouette, aerodynamic envelope |
The smallest radius creates a sparse representation: only grid cells very close to the surface find any neighbors. The largest radius means nearly every cell in the domain sees the car. Together, they give the network both fine local detail and broad context.
Explore: Ball Queries (3D, interactive) →
Learning geometry features
Each ball query scale feeds into a point convolution block that operates on the unstructured neighbor point sets and produces per-grid-cell feature representations. These learned features are then propagated through the computational domain using CNN blocks containing convolution, pooling, and unpooling layers.
This is done iteratively in a multi-resolution fashion, propagating geometry representation between the surface and computational domain grids to learn both short- and long-range dependencies. The features from all scales are combined to form a global geometry representation.
Explore: Feature Propagation →
Global geometry encoding
The multi-scale geometry features from the point convolution path are combined with the SDF-derived features (distance field + gradient components) into a unified geometry encoding on the volume grid. This encoding captures both learned implicit surface features and explicit analytical distance information.
The SDF provides smooth, globally-consistent distance and topology data that complements the point convolution features. Where the learned features capture surface detail from local neighborhoods, the SDF tells the network "how far am I from the surface, and which side am I on?"
From grid to query points
The geometry encoding lives on a regular 3D grid, but predictions need to be at arbitrary query points scattered through the volume (or on the surface). Two mechanisms bridge this gap:
Local geometry extraction constructs a computational stencil of neighboring points around each query location and extracts features from the global geometry representation at that local region. This is analogous to how finite volume methods operate on local stencils rather than the full domain.
Explore: Local Geometry Extraction →
Per-point geometric features (SDF value, scaled SDF, vectors to the nearest surface and to the geometry centroid) provide additional spatial context. These features give the network explicit knowledge of each query point's relationship to the car surface, independent of its absolute position in the domain.
Explore: Positional Features →
Solution prediction
The aggregation network combines the local geometry encoding with the per-point features and predicts solution vectors on each point in the computational stencil. These predictions are then averaged using an inverse-distance weighting (IDW) scheme to produce the final solution at the query point.
This stencil-based approach is what makes DoMINO "decomposable": the prediction at any point depends only on local information, so the model scales to arbitrary domain sizes without global constraints. It's directly analogous to how classical CFD methods solve on local stencils.
Importantly, the global geometry encoding is shared between surface and volume predictions, but the aggregation networks are separate. If query points are sampled on the surface, the network predicts surface fields; if sampled in the volume, it predicts volume fields.
Explore: Solution Prediction →
The output
DoMINO predicts two sets of fields:
Volume fields (at ~50K query points in the flow domain):
- Pressure (p): positive at stagnation (front), negative at suction (roof/wake). Primary driver of aerodynamic drag.
- Velocity (Ux, Uy, Uz): three components of the flow velocity vector. Ux dominates (streamwise direction); reversed in wake recirculation zones.
- Turbulent viscosity (vt): high in boundary layers and wake. Indicates turbulent mixing intensity.
Surface fields (at points on the car body):
- Pressure (p): surface pressure distribution, critical for drag and lift prediction.
- Wall shear stress (tx, ty, tz): three components of the viscous stress at the wall. Essential for skin friction drag and flow separation analysis.
Together these constitute a full RANS solution on both the surface and in the volume.
The dataset
The explorer uses the DrivAerML dataset: 500 geometrically morphed variants of the DrivAer Notchback car with full RANS CFD solutions. Each sample contains ~150 million volume mesh elements and ~10 million surface elements, with drag forces ranging from 300N to 600N across the variants. Training uses 90% of the data; the remaining 10% is held for testing, including out-of-distribution cases at the drag extremes.
The point cloud and 3D model shown in the explorer are from the baseline DrivAer configuration.