FacebookFacebook group TwitterTwitter
ICVSS Computer Vision for Spatial and Physical Intelligence

Dynamic Humans: Generating 3D Human Motion with Language

Gul Varol

École des Ponts ParisTech, FR

Abstract

This lecture will describe works bridging natural language and 3D human motions. In particular, we will look at the evolution of text-driven generative models, e.g., given a text like `jump forward with your arms raised', can we synthesize a corresponding 3D human motion? This is a relatively recent field, which has witnessed a sudden growth. I will summarize some of the key works based on VAEs and diffusion models. We will have a special emphasis on compositionality to handle finegrained textual descriptions. In the last part, I will also show results of follow-up works on text-to-motion retrieval (CLIP-like models for 3D motions), text-based motion editing, and 3D hand motion generation.