Humanoid Robot Learning: From Foundations to Visual Perception
Carlo Sferrazza
University of Texas at Austin & Amazon Frontiers AI and Robotics, US
Abstract
Humanoid robots represent the ideal physical embodiment to assist us across the diversity of daily tasks and human-centered environments. Driven by major advances in hardware, artificial intelligence, and the growing demand for adaptable automation, this vision appears increasingly within reach. Yet humanoid intelligence remains far from the general-purpose capabilities we ultimately seek. In this lecture, I will discuss the unique challenges humanoids pose for robot learning and present approaches to scale learning through novel tools (HumanoidBench, MuJoCo Playground, Holosoma), flexible algorithms (OmniRetarget, FastSAC), and expressive architectures (Body Transformer). I will then focus on how perceptive visual policies can unlock new humanoid capabilities, with examples including adaptive parkour and whole-body interaction (PHP), perceptive terrain traversal (RPL), and visually realistic simulation for improved sim-to-real transfer (GaussGym).
Facebook group
Twitter
Computer Vision for Spatial and Physical Intelligence