Alibaba Wanx 2.1 Open-Source AI Video Generation


Alibaba Cloud has unveiled Wanx 2.1, a groundbreaking model that redefines AI-driven video generation. Building upon its predecessor, Tongyi Wanxiang, introduced in July 2023, Wanx 2.1 represents a significant leap forward in transforming text inputs into high-quality visual content.

A New Era in AI Video Generation

Wanx 2.1 is designed to generate realistic visuals by accurately handling complex movements, enhancing pixel quality, adhering to physical rules, and optimizing the precision of instruction follow-through. Its precision in following instructions has propelled Wanx 2.1 to the top of the VBench leaderboard, a comprehensive benchmark suite for video generative models. According to VBench, with an overall score of 84.7%, Wanx 2.1 leads in key dimensions such as dynamic degree, spatial relationships, and multi-object interactions.


Technological Innovations Behind Wanx 2.1

The advancements in Wanx 2.1 are underpinned by several technological innovations:

  • Proprietary VAE and DiT Framework: By leveraging a proprietary Variational Autoencoder (VAE) and Denoising Diffusion Transformer (DiT) framework, Wanx 2.1 excels in strengthening temporal and spatial relationships, achieving higher visual realism in scenes involving complex motion and adherence to physical laws.
  • Full Space-Time Attention Mechanism: This mechanism enables the model to mimic the complex dynamics of the real world with remarkable accuracy, ensuring that generated videos maintain coherence and realism.
  • Accelerated Training with Ultra-Long Context: Innovative approaches have been adopted to accelerate the model’s training process using ultra-long context, ensuring seamless integration of text instructions into video generation, enabling faster and more intuitive content creation.

Real-World Applications and Accessibility

Wanx 2.1’s capabilities extend to generating videos with large-scale bodily movements and complex rotations. Even in challenging scenarios such as figure skating, swimming, and diving, the model maintains body coordination and adheres to realistic motion trajectories, setting a new benchmark for video generation. To maximize accessibility, Wanx 2.1 is currently available for free on its official Chinese website.

Individual developers and corporate users can explore its potential through Alibaba Cloud’s generative AI platform, Model Studio. This empowers users to create high-quality visual content tailored to their unique needs, further bridging the gap between AI technology and creative industries. Here is one of the AI video generated with Wanx 2.1 using the text prompt:


Text Prompt:「平拍一位女性花样滑冰运动员在冰场上进行表演的全景。她穿着紫色的滑冰服,脚踩白色的滑冰鞋,正在进行一个旋转动作。她的手臂张开,身体向后倾斜,展现了她的技巧和优雅」。English translation: “A panoramic shot of a female figure skater performing on an ice rink. She is wearing a purple skating outfit and white skates, executing a spinning move. Her arms are outstretched, and her body leans backward, showcasing her skill and grace.”

Open-Sourcing Wan 2.1: A Step Towards Collaborative Innovation

In a move that underscores Alibaba’s commitment to fostering innovation, the company announced plans to release an open-source version of Wan 2.1. This decision comes amid intense competition in China’s AI market, with other companies like DeepSeek also making significant advancements. By open-sourcing Wan 2.1, Alibaba aims to accelerate the adoption of AI technologies and encourage collaborative development within the global AI community.

The open-source release includes four models: T2V-14B (Supports both 480P and 720P), T2V-1.3B (Supports 480P), I2V-14B-720P (Supports 720P), and I2V-14B-480P (Supports 480P), designed to generate high-quality images and videos from text and image inputs. These models are available for download on Alibaba Cloud’s AI model community, Model Scope, and the collaborative AI platform Hugging Face, accessible to academics, researchers, and commercial institutions worldwide.


Ethical Considerations and the Future of AI Video Generation

The rapid advancement of AI models like Wanx 2.1 brings forth ethical considerations, particularly concerning the potential misuse of AI-generated content. The ability to create hyper-realistic videos raises questions about consent, privacy, and the proliferation of deepfakes. It is imperative for developers, policymakers, and society at large to engage in discussions about the responsible use of such technologies to mitigate potential risks.

Looking ahead, the open-sourcing of Wan 2.1 is poised to democratize access to advanced AI video generation tools, fostering innovation across various industries. As more developers and researchers engage with the model, we can anticipate a surge in creative applications, from entertainment and education to advertising and beyond. However, this also necessitates the establishment of ethical guidelines and regulatory frameworks to ensure that the technology is used responsibly and for the greater good.

Alibaba Wanx 2.1: Alibaba Cloud Unveiled Wanx 2.1: Redefining AI-Driven Video Generation
Wan 2.1: Github Open Source
Wan 2.1 Model Download: Hugging Face

Recent Posts