cd ~/

Eye Tracking in AR: Technical Challenges and Approaches

Building robust eye tracking for mixed reality - from pupil detection to gaze estimation to the unique challenges of see-through displays.

Evyatar Bluzer
3 min read

Eye tracking enables the next level of AR interaction: foveated rendering, natural UI, social presence. It's also one of the hardest perception problems on the headset.

Why Eye Tracking is Hard

The eye is a moving target - saccades (fast eye movements) reach 500°/s. Your tracker needs to keep up.

Variable conditions - pupil size changes 2-8mm based on lighting. Makeup, glasses, contact lenses add variation.

Near-eye optics - unlike webcam eye tracking, we're millimeters from the eye. Extreme wide-angle distortion.

Occlusion - eyelids, eyelashes, reflections from the display all interfere.

Biometric sensitivity - iris patterns are unique identifiers. Privacy constraints apply.

The Eye Tracking Pipeline

IR Illumination → Eye Camera → Pupil Detection →
Glint Detection → Gaze Estimation → Filtering/Prediction

IR Illumination

Multiple IR LEDs create "glints" (corneal reflections) that provide geometric reference points.

Pupil Detection

Find the pupil ellipse in the eye image. Challenges:

  • Variable size and shape
  • Partial occlusion by eyelids
  • Reflections from display

Classical approach: edge detection + ellipse fitting Learning approach: trained pupil segmentation network

Glint Detection

Corneal reflections of IR LEDs. Their positions relative to pupil indicate gaze direction.

Problem: display reflections create false glints. Discrimination via modulation patterns.

Gaze Estimation

Model-based: Fit a 3D eye model (cornea as sphere, pupil as disk). Estimate gaze as optical axis.

  • Requires calibration per user
  • Robust once calibrated
  • Handles glasses poorly

Appearance-based (learned): Direct regression from eye image to gaze vector.

  • Needs large training data
  • Can handle more variation
  • May not generalize to unseen conditions

Hybrid: Model-based geometry + learned refinement. Our current direction.

The Calibration Problem

Users have different:

  • Eye shapes
  • Kappa angle (visual axis vs optical axis)
  • Head-eye geometry

Standard solution: 5-9 point calibration where user looks at known targets.

UX problem: users hate calibration. It's boring, takes time, and must be repeated.

We're researching implicit calibration - inferring the calibration parameters from natural gaze behavior over time.

Foveated Rendering Requirements

To save rendering compute, only render full detail where the user is looking.

Requires:

  • Latency under 10ms from eye movement to render adjustment
  • Accuracy under 1° to avoid visible quality transitions
  • Prediction of saccade endpoints (because saccades are faster than rendering)

This is aggressive. Our current system achieves ~15ms latency. Getting below 10ms requires tight integration with the display pipeline.

More on the optics side of eye tracking next month.

Comments