Real-Time 3D Reconstruction: Meshing at Interactive Rates

V1's spatial mapping was good enough for plane detection. V2 needs dense, accurate meshes for realistic occlusion and physics.

V1 Limitations

Current meshing:

For V2, we need:

Depth Frames → Depth Filtering → TSDF Integration →
Mesh Extraction → Mesh Simplification → Collision Mesh

Raw depth has holes and noise. Filter before integration:

Truncated Signed Distance Function - implicit surface representation:

V1: Dense voxel array, 5cm resolution, limited volume V2: Sparse voxel structures (octree or hash), 1cm resolution, room-scale

Marching cubes extracts triangle mesh from TSDF:

Raw marching cubes produces too many triangles. Simplify:

Reconstruction is embarrassingly parallel:

V1: CPU implementation (slow, eats power) V2: GPU compute pipeline

Expected improvement: 10x throughput at similar power.

Depth sensors don't see everything (occlusions, range limits, specular surfaces). Can we complete what's missing?

Approaches:

We're prototyping learned completion:

Early results promising for common objects. Generalization to arbitrary scenes is harder.

Room-scale at 1cm resolution = billions of voxels if dense.

Solutions:

Memory budget: 500MB for reconstruction.

Target: kitchen-sized space at 1cm with dynamic updates.