FacebookFacebook group TwitterTwitter
ICVSS Computer Vision for Spatial and Physical Intelligence

Spatial Intelligence: The New Frontier of Computer Vision

Andrea Vedaldi

University of Oxford, UK

Abstract

Despite the tremendous progress of large language models, even the most advanced AIs remain poorly grounded in the physical world. In fact, machines still lack a robust Spatial Intelligence, intended as the ability to solve problems that require an understanding of complex spatial phenomena. This gap is a primary obstacle in building general-purpose robots and explains many other limitations of current AIs, such as the tendency of generating videos that are not physically plausible. In this lecture, I will argue that the development of Spatial Intelligence is one of the most important goals of future computer vision research. I will then discuss two major areas of development that are propelling this area forward. The first is the development of 3D (and 4D) foundation models. I will, in particular, review the outstanding progress of machine learning applied to visual geometry. The second is the development of new latent spaces that represent 3D geometry robustly and compactly, and that can thus be an important component in integrating 3D vision in multi-modal language models.