FacebookFacebook group TwitterTwitter
ICVSS Computer Vision for Spatial and Physical Intelligence

Humanoid Robot Learning: From Foundations to Visual Perception

Carlo Sferrazza

University of Texas at Austin & Amazon Frontiers AI and Robotics, US

Abstract

Humanoid robots represent the ideal physical embodiment to assist us across the diversity of daily tasks and human-centered environments. Driven by major advances in hardware, artificial intelligence, and the growing demand for adaptable automation, this vision appears increasingly within reach. Yet humanoid intelligence remains far from the general-purpose capabilities we ultimately seek. In this lecture, I will discuss the unique challenges humanoids pose for robot learning and present approaches to scale learning through novel tools (HumanoidBench, MuJoCo Playground, Holosoma), flexible algorithms (OmniRetarget, FastSAC), and expressive architectures (Body Transformer). I will then focus on how perceptive visual policies can unlock new humanoid capabilities, with examples including adaptive parkour and whole-body interaction (PHP), perceptive terrain traversal (RPL), and visually realistic simulation for improved sim-to-real transfer (GaussGym).