Recreating the Physical Natural World from Images
Packard 101
Abstract: Today, generative AI models excel at recreating the visual world through pixels, but often struggle with the comprehension of basic physical concepts such as 3D shape, motion, material, and lighting—key elements that connect computer vision to a wide range of engineering disciplines for building real world applications, including interactive VR, robotics, and scientific analysis. A major roadblock has been the difficulty in collecting large-scale datasets of physical measurements for training. In this talk, I will discuss an alternative approach through inverse rendering, which enables machine learning models to extract explicit physical representations from raw, unstructured image data, such as Internet photos and videos. This approach thus circumvents the need for any direct physical measurements, allowing us to model a wide variety of 3D objects in nature, including diverse wildlife, using only casually recorded imagery. The resulting model turns images into physically-grounded 3D assets and controllable animations instantly, ready for downstream rendering and analysis.
Bio: Shangzhe Wu is a postdoc researcher at Stanford University working with Jiajun Wu. He will join the Department of Engineering at the University of Cambridge in spring 2025. He received his PhD from the University of Oxford, advised by Andrea Vedaldi and Christian Rupprecht. His research focuses on unsupervised 3D perception and inverse rendering. His work received the Best Paper Award at CVPR 2020 and the BMVA Sullivan Doctoral Thesis Prize.