Interactive Digital Twins and Generative Worlds

DEMO

Hongchi Xia, Shenlong Wang

Embodied AI urgently requires high-quality simulation environments that are physically grounded, visually realistic, and truly interactive. Such environments must simulate physical laws like gravity and friction, provide high-resolution dynamic scenes, and support agents in perceiving, manipulating, and reasoning while learning from feedback.

This poster and demo will present three research contributions toward this goal. In the **digital twin** direction, **DRAWER** (CVPR 2025) reconstructs high-fidelity digital twins with articulated structures — such as drawers and cabinet doors — from real-world images, enabling fine-grained modeling of non-rigid objects. **HoloScene** (NeurIPS 2025) automatically generates simulation-ready, interactive 3D worlds from a single video, substantially reducing construction cost. In the **generative scene** direction, **SAGE** (CVPR 2026) proposes a scalable, agent-driven 3D scene generation framework that supplies rich and diverse high-quality environments for embodied AI training. Together, these works establish a complete pipeline — from real-world data capture to large-scale scene generation — laying a solid foundation for training and evaluating embodied AI systems.

Interactive Digital Twins and Generative Worlds

Interactive Digital Twins and Generative Worlds