Meituan open-sources LongCat-Video-Avatar, advancing realistic and consistent AI avatars for long-form video
2025-12-20 22:02:00+08
Meituan’s LongCat Team has open-sourced its latest audio-driven video generation model—LongCat-Video-Avatar—marking a significant leap forward in virtual human technology. Designed specifically for high-fidelity, long-duration avatar synthesis, the model delivers exceptional lip-sync accuracy, natural motion dynamics, and stable identity consistency, drawing strong interest from developers and researchers worldwide.
Built upon the foundation of LongCat-Video, a 13.6B-parameter general-purpose video generation model, LongCat-Video-Avatar natively supports multiple modalities:
- Audio-Text-to-Video (AT2V)
- Audio-Text-Image-to-Video (ATI2V)
- Audio-conditioned video continuation
Compared to its predecessor InfiniteTalk, the new model achieves substantial improvements in motion realism, visual stability, and character identity preservation—addressing key pain points in long-form avatar generation.
A core innovation is the Cross-Chunk Latent Stitching training strategy, which eliminates quality degradation caused by repeated VAE decode-encode cycles during autoregressive generation. By stitching latent features directly in the compressed space, the model maintains pixel fidelity while boosting inference efficiency.
To ensure consistent identity over extended sequences, LongCat-Video-Avatar introduces Reference Skip Attention combined with position-encoded reference frame injection. This approach effectively anchors character appearance without causing the “copy-paste” rigidity or motion stagnation seen in prior methods.
Evaluated on benchmark datasets including HDTF, CelebV-HQ, EMTD, and EvalTalker, the model achieves state-of-the-art (SOTA) performance—particularly in lip-sync accuracy and identity consistency. Large-scale human evaluations further confirm its superior naturalness and visual realism.
The LongCat Team emphasizes its commitment to open collaboration, releasing the model under the MIT License to empower the developer community. The team aims to solve real-world challenges in digital human creation—from podcast avatars and sales demos to multi-person dialogues—and welcomes community feedback for continuous improvement.
Developers can access LongCat-Video-Avatar via:
- GitHub: https://github.com/meituan-longcat/LongCat-Video
- Hugging Face: https://huggingface.co/meituan-longcat/LongCat-Video-Avatar
- Project Page: https://meigen-ai.github.io/LongCat-Video-Avatar/
With this release, LongCat paves the way for scalable, expressive, and personalized digital avatars—ushering in a new era of “one prompt, one unique avatar” content creation.