WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction

Zilong Wang1, Zhiyang Dou2, Yuan Liu3, Cheng Lin2, Xiao Dong4, Yunhui Guo1, Chenxu Zhang1 Xin Li5 Wenping Wang5 Xiaohu Guo1
1The University of Texas at Dallas, 2The University of Hong Kong, 3The Hong Kong University of Science and Technology, 4BNU-HKBU United International College, 5Texas A&M University
method

Pipline of our WonderHuman

Abstract

In this paper, we present WonderHuman to reconstruct dynamic human avatars from a monocular video for high-fidelity novel view synthesis. Previous dynamic human avatar reconstruction methods typically require the input video to have full coverage of the observed human body. However, in daily practice, one typically has access to limited viewpoints, such as monocular front-view videos, making it a cumbersome task for previous methods to reconstruct the unseen parts of the human avatar. To tackle the issue, we present WonderHuman, which leverages 2D generative diffusion model priors to achieve high-quality, photorealistic reconstructions of dynamic human avatars from monocular videos, including accurate rendering of unseen body parts. Our approach introduces a Dual-Space Optimization technique, applying Score Distillation Sampling (SDS) in both canonical and observation spaces to ensure visual consistency and enhance realism in dynamic human reconstruction. Additionally, we present a View Selection strategy and Pose Feature Injection to enforce the consistency between SDS predictions and observed data, ensuring pose-dependent effects and higher fidelity in the reconstructed avatar. In the experiments, our method achieves SOTA performance in producing photorealistic renderings from the given monocular video, particularly for those challenging unseen parts.



Comparison with Video-based Methods

Comparison with HumanNeRF, Instant_NVR and GaussianAvatar:


Qualitative comparsion with HumanNeRF, Instant_NVR and GaussianAvatar on In-the-wild and MVHumanNet dataset. Our method consistently demonstrates superior performance in addressing the challenges of invisible parts synthesis, excelling in both geometry and appearance reconstruction.


Comparison with GuessTheUnseen:


Qualitative comparison with GuessTheUnseen. Please refer to our paper for quantitative results.


Comparison with Image-based Methods

Comparison with SIFU, SITH, and ELICIT:


Qualitative comparison with SIFU, SITH, and ELICIT. Our method associates texture to different body parts across frames and can predict the correct texture for unseen parts robustly.


Novel pose animations


Our method aligns the generated Gaussian human avatars with the SMPL model, enabling us to animate the reconstructed avatar with novel poses,


Related Links

GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians

Zero-1-to-3: Zero-shot One Image to 3D Object

ECON: Explicit Clothed humans Optimized via Normal integration

Sapiens: Foundation for Human Vision Models

BibTeX

If you find this work helpful, you can cite our paper as follows:


@misc{wang2025wonderhuman,
      title={WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction}, 
      author={Zilong Wang and Zhiyang Dou and Yuan Liu and Cheng Lin and Xiao Dong and Yunhui Guo and Chenxu Zhang and Xin Li and Wenping Wang and Xiaohu Guo},
      year={2025},
      eprint={2502.01045},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2502.01045}, 
      }

If you have any questions or feedbacks, please contact Zilong Wang (zlwangyg@gmail.com).