Transfer Learning for Perception: Sim-to-Real and Beyond
Techniques for transferring knowledge from synthetic to real data, and from one perception task to another.
Training on synthetic data, deploying on real devices. The transfer problem is central to our synthetic data strategy.
The Transfer Challenge
Model trained on synthetic data: 95% accuracy on synthetic test set. Same model on real data: 72% accuracy.
This 23% gap is the sim-to-real gap. Closing it is the game.
Domain Adaptation Techniques
Feature Alignment
Force the network to learn domain-invariant features:
Adversarial training: Add discriminator that tries to distinguish synthetic vs real features. Train generator to fool discriminator.
┌──────────────┐
Image ───────►│ Encoder │───► Features ───► Task Head ───► Prediction
└──────────────┘ │
│
┌────────▼────────┐
│ Discriminator │
│ (syn vs real) │
└─────────────────┘
Loss = TaskLoss - λ × DomainLoss
The subtraction makes the encoder adversarial to the discriminator.
Maximum Mean Discrepancy (MMD): Minimize statistical distance between feature distributions.
Self-Training
Use model to label unlabeled real data, then train on pseudo-labels:
- Train on synthetic (labeled)
- Apply to real (unlabeled), get predictions
- Filter high-confidence predictions
- Retrain on synthetic + pseudo-labeled real
- Repeat
Each iteration improves real-domain performance.
Fine-Tuning with Minimal Real Data
How much real data do you need to close the gap?
Experiments show:
- 0% real: 72% accuracy
- 1% real + 99% synthetic: 85% accuracy
- 10% real + 90% synthetic: 91% accuracy
- 100% real: 93% accuracy
Small amounts of real data provide disproportionate benefit. Collect strategically.
Cross-Task Transfer
Can training on one task help another?
Shared representations: Low-level features (edges, textures) transfer across tasks.
Example: Hand segmentation model → Hand keypoint model
- Pre-train encoder on segmentation (abundant labels)
- Fine-tune full model on keypoints (scarce labels)
Result: 15% better keypoint accuracy with same keypoint data.
Multi-task learning: Train on multiple tasks simultaneously.
Shared encoder → Multiple heads (segmentation, depth, keypoints)
Benefits:
- Regularization effect
- Efficient use of data
- Single model serves multiple needs
Challenges:
- Task interference (one task hurts another)
- Loss weighting (which tasks matter more?)
Practical Pipeline
Our production pipeline:
- Pre-train large model on synthetic data (all the data we can generate)
- Domain adapt using adversarial + self-training (no real labels needed)
- Fine-tune on curated real dataset (expensive to collect)
- Specialize per-device if calibration data available
Each stage improves real-world performance.
Measuring Transfer
Metrics we track:
- Absolute gap: Real accuracy - Synthetic accuracy
- Transfer ratio: (Real accuracy with transfer) / (Real accuracy with real training)
- Data efficiency: Real samples needed to reach target accuracy
Our hand tracking model: 0.85 transfer ratio with zero real data. With 10K real images: 0.97 transfer ratio.
Synthetic data is working.