TP3 - Dress&Dance: Dress up and Dance as You Like It

POSTER

Junkun "Jun-Kun" Chen, Aayush Bansal, Minh Phuoc Vo, Yuxiong "Yu-Xiong" Wang

We present Dress&Dance, a video diffusion framework for immersive virtual try-on, designed to bridge the gap between current system capabilities and the experience users expect from realistic digital fitting. Our method targets three key requirements for an immersive try-on experience: intuitive motion control, flexible conditioning for diverse try-on scenarios, and native high-resolution, high-frame-rate generation.

Dress&Dance formulates try-on by decoupling user appearance from motion control. Given a single user photo, garment image(s), and a motion reference video, it generates a try-on video in which the user wears the target garment(s) while following the reference motion. This enables users to control motion simply by selecting or swapping a motion template, without recording a full-body personal video.

To support heterogeneous and pixelwise-unaligned inputs across changing try-on settings, we introduce CondNets, an attention-based conditioning architecture that unifies all conditions into a shared token space, together with garment-aware target steering for accurate garment placement. To enable high-fidelity generation at scale, we build a high-resolution dataset and benchmark, leverage video-image hybrid training, synthesize unpaired triplets for supervision, and adopt a multi-stage curriculum that progressively increases resolution and frame rate.

Experiments show that Dress&Dance produces high-fidelity, temporally coherent try-on videos across diverse modes, moving virtual try-on closer to a truly immersive experience.

https://immortalco.github.io/DressAndDance/