August 2025 · 15 min
Monosemantic Features in Vision-Language-Action Models
We trained transcoders on π₀-FAST to uncover interpretable features in robotics models, revealing two classes: "scene" features that highlight objectives and "ego" features focused on the robot itself.
John Newsom, Cody Swain