Large and high-quality motion datasets are essential for advancing human motion modeling. However, limitations of existing motion datasets, such as insufficient scale or inadequate quality, significantly hinder the progress of this field. To address these limitations, we introduce Mimic-X, a large-scale (52 hours), physically plausible 3D human motion dataset. To construct Mimic-X, we develop an adaptive option framework that controls a physically simulated character to imitate low-quality motions extracted from a vast collection of online videos. Specifically, we first apply hierarchical clustering to group motions into clusters, and then train option policies to mimic motions sampled from these clusters. Considering the noisy nature of low-quality motions, we utilize a separate encoder for each cluster to map the noisy motions within the cluster into a compact latent space. This significantly enhances the quality of the imitated motions while accelerating the learning process. Subsequently, we employ dynamic programming as a meta-policy to efficiently organize the option policies to generate complete motion clips. Finally, we perform fine-tuning to each motion sequence to further refine motion quality. The proposed adaptive option framework outperforms state-of-the-art human motion recovery methods across various evaluation metrics, demonstrating that motions in Mimic-X exhibit higher quality and greater physical plausibility. Furthermore, experimental results show that Mimic-X enhances the performance of motion generation methods, verifying its effectiveness for motion modeling tasks.
Demos included in the main paper can be found in the following.
System overview. Our system comprises three main steps: preprocessing, pre-training, and adaptation. In the preprocessing, we extract kinematic motion data from videos, resulting in a low-quality motion dataset. During the pre-train step, we apply hierarchical clustering to group the low-quality dataset into several clusters, and train option policies to mimic motions sampled from the clusters using deep reinforcement learning. Finally, in the adaptation step, an option selector arranges the option policies to mimic the reference motion. After fine-tuning for several minutes, we obtain high-quality simulated motions.
Generated results from T2M-GPT, T2M-GPT-Motion-X, and T2M-GPT-Mimic-X.
@inproceedings{tao2026mimicx,
title={Mimic-X: A Large-Scale Motion Dataset via Fast Physics-Based Controller Adaptation},
author={Tao, Hongyu and Hou, Shuaiying and Fang, Junheng and Shi, Mingyao and Xu, Weiwei},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2026}
}