FacebookFacebook group TwitterTwitter
ICVSS Computer Vision for Spatial and Physical Intelligence

Reasoning Models for Physical AI: From Fundamentals to Real-World Applications

Marco Pavone

Stanford University & NVIDIA, US

Abstract

Reasoning models for Physical AI aim to equip embodied agents—such as robots and autonomous vehicles—with the ability to perceive, interpret, and act in the real world using contextual understanding and causal reasoning. Recent advances in reasoning-based vision–language–action (VLA) architectures show how integrated multimodal reasoning traces and action outputs enable systems to tackle complex, rare, and safety-critical scenarios that traditional perception–planning pipelines struggle with. This talk provides an overview of foundational concepts in reasoning-centric AI for physical systems, highlighting the emergence of chain-of-thought reasoning in autonomous driving models such as NVIDIA’s Alpamayo family, which bridges human-like reasoning with trajectory planning to improve handling of long-tail scenarios. We discuss model design principles, training strategies that promote causal reasoning, tools and datasets for development and evaluation, and open challenges at the intersection of physical reasoning, safety, and real-world deployment. Attendees will gain a structured understanding of the current landscape and practical guidance for advancing research and applications in reasoning models for Physical AI.