Photorealistic Human Avatar Generation and Real-Time Pose Editing with Gaussian Splatting

Abstract

This thesis introduces a two-part framework for photorealistic human avatar reconstruction and real-time pose editing from monocular video.

The first module, GSDHuman, employs Gaussian Surfels to reconstruct Dynamic Human avatars within minutes, offering efficient human avatar reconstruction from single-view inputs. The second, 3DFilmPCA, is a photorealistic avatar system based on rigged 3D Gaussian Splatting, enabling SMPL-driven Pre-Composite Action control. Users can interactively manipulate SMPL joints and instantly visualize updates on the associated Gaussian representation.

Together, these components form a complete pipeline that accelerates the creation of control- lable, high-fidelity human models. The proposed methods are tailored for artist-driven workflows in visual effects, digital filmmaking, and virtual human editing. Unlike conventional techniques that rely on complex geometry fitting or canonical space learning, our approach achieves rapid and high-quality results using only monocular video.

Animation

Interpolating states

We can also animate the scene by interpolating the deformation latent codes of two input frames. Use the slider here to linearly interpolate between the left frame and the right frame.

End Frame

Re-rendering the input video

Using Nerfies, you can re-render a video from a novel viewpoint such as a stabilized camera by playing back the training deformations.

Related Links

There's a lot of excellent work that was introduced around the same time as ours.

Progressive Encoding for Neural Optimization introduces an idea similar to our windowed position encoding for coarse-to-fine optimization.

D-NeRF and NR-NeRF both use deformation fields to model non-rigid scenes.

Some works model videos with a NeRF by directly modulating the density, such as Video-NeRF, NSFF, and DyNeRF

There are probably many more by the time you are reading this. Check out Frank Dellart's survey on recent NeRF papers, and Yen-Chen Lin's curated list of NeRF papers.

Acknowledgements and Funding

Graduate study was supported by a fellowship from Texas A&M University and research funding from the National Science Foundation.

I would like to sincerely thank Xin Li for his guidance, patience, and insightful advice throughout this thesis. I am also grateful to my committee members, John Keyser and Ann McNamara, for their valuable feedback and support.

Special thanks to Zhengming Yu (Texas A&M University) for mentorship and technical discussions that contributed to the development of GSDHuman, particularly in Gaussian Splatting techniques and related system design.

Photorealistic Human Avatar Generation and Real-Time Pose Editing with Gaussian Splatting

GSDHuman: Gaussian Surfels for Dynamic Human Reconstruction from Monocular Video

Nerfies turns selfie videos from your phone into free-viewpoint portraits.

3DFilmPCA: Photorealistic Human Avatars with Rigged 3D Gaussian Splatting for SMPL-Driven Pre-Composite Action Control

Nerfies turns selfie videos from your phone into free-viewpoint portraits.

Abstract

Video

Nerfies turns selfie videos from your phone into free-viewpoint portraits.

Shape Editting

Pose Editting

Animation

Interpolating states

Re-rendering the input video

Related Links

Acknowledgements and Funding