Use Wan within ElevenLabs
• Generate and edit any visual imaginable
• Experience the world's best AI generation models, all within ElevenLabs
Trusted by 1,000,000+ creators & teams
Why Wan?
Wan is an advanced AI video model developed by Alibaba’s Tongyi Lab. The latest version, Wan 2.6, blends text, image, and audio inputs to produce coherent video sequences with narrative flow, character consistency, and synchronized audiovisual output — enabling richer storytelling from prompt to finished clip.
Core strengths
Multimodal input support
Generate videos from text prompts, images, video references, and audio input, building expressive scenes from hybrid sources.
Intelligent multi-shot storytelling
Automatically orchestrates narrative flow across multiple shots with consistent motion, camera angles, and transitions.
Synced audio‑visual generation
Produces video with native sync across motion, dialogue, music, and ambient sound, no manual alignment needed.
Extended video duration
Supports 5–15 second clips with scene transitions and coherent visual logic — ideal for short-form narratives and campaign assets.
Character and scene continuity
Maintains facial structure, clothing, and environments across shots for visual consistency and brand fidelity.
Add narration and sound design
Use ElevenLabs audio tools to bring Wan videos to life: voiceover with your cloned voice; original music with ElevenMusic; cinematic sound effects with AI SFX tools.
Top models, one platform
Wan runs alongside Kling, Seedream, Nanobanana, and more – all integrated into the ElevenLabs creative workspace.
Bring your creations to Studio — an all-in-one AI editor
Use Studio to finalize Wan projects with full control over audio, timing, and localization.
Timeline editing
Precisely control audio tracks, transitions, and effects across every second of video.
Multilingual voiceover and captions
Add expressive narration and generate captions in over 30 languages for global content delivery.
Secure sharing and collaboration
Share clips with collaborators and clients through permission-based project links.
Built for every creator
From video creators to podcasters and audiobook authors, Studio 3.0 adapts to every workflow, elevating storytelling with the polish of professional production.
Frequently asked questions
Who developed Wan?
Wan is a multimodal AI video generation model developed by Alibaba's Tongyi Lab.
What types of inputs does Wan support?
Text prompts, images, video references, and audio input, supporting rich, multi-source generation.
How long are videos generated by Wan?
Wan supports clips ranging from 5 to 15 seconds, with scene and character consistency across shots.
Does Wan generate synchronized audio?
Yes. Newer versions include native audio‑visual sync for voices, sound effects, and music.