Generating...
Wan
1080p
16:9

Use Wan within ElevenLabs

• Generate and edit any visual imaginable
• Experience the world's best AI generation models, all within ElevenLabs

Why Wan?

Wan is an advanced AI video model developed by Alibaba’s Tongyi Lab. The latest version, Wan 2.6, blends text, image, and audio inputs to produce coherent video sequences with narrative flow, character consistency, and synchronized audiovisual output — enabling richer storytelling from prompt to finished clip.

Core strengths

Multimodal input support
Generate videos from text prompts, images, video references, and audio input, building expressive scenes from hybrid sources.
Intelligent multi-shot storytelling
Automatically orchestrates narrative flow across multiple shots with consistent motion, camera angles, and transitions.
Synced audio‑visual generation
Produces video with native sync across motion, dialogue, music, and ambient sound, no manual alignment needed.
Extended video duration
Supports 5–15 second clips with scene transitions and coherent visual logic — ideal for short-form narratives and campaign assets.
Character and scene continuity
Maintains facial structure, clothing, and environments across shots for visual consistency and brand fidelity.
Add narration and sound design

Use ElevenLabs audio tools to bring Wan videos to life: voiceover with your cloned voice; original music with ElevenMusic; cinematic sound effects with AI SFX tools.

Top models, one platform
Wan runs alongside Kling, Seedream, Nanobanana, and more – all integrated into the ElevenLabs creative workspace.
  • Google Veo 3
    Video generation
  • Sora 2
    Video generation
  • Topaz Upscale
    Video upscale
  • Veed Lipsync
    Lip syncing
  • Nano Banana
    Image generation
  • Flux 1
    Image generation
  • Kling
    Video generation
  • Seedance
    Video generation
  • Omnihuman
    Lip syncing
  • Wan
    Image generation
  • Seedream
    Image generation

Bring your creations to Studio — an all-in-one AI editor

Use Studio to finalize Wan projects with full control over audio, timing, and localization.

Timeline editing

Precisely control audio tracks, transitions, and effects across every second of video.
8uexnhi0qnh-Timeline.webp

Multilingual voiceover and captions

Add expressive narration and generate captions in over 30 languages for global content delivery.
clq5lezluo8-Graphic%205.webp

Secure sharing and collaboration

Share clips with collaborators and clients through permission-based project links.
q3n3si29pc-Graphic%204.webp

Built for every creator

From video creators to podcasters and audiobook authors, Studio 3.0 adapts to every workflow, elevating storytelling with the polish of professional production.
wan.avif
Marketers and brand teams

Produce narrative ad spots and branded videos with consistent visual identity and synchronized voice.

wan-3.avif
Video creators and filmmakers

Transform prompts into multi-shot scenes with expressive audio and cinematic storytelling.

wan-2.avif
Designers and storyboard artists

Animate design frames or stills into coherent video sequences with continuity across shots.

Enterprise-grade security and infrastructure for AI Voices at scale

Enterprise-grade security and infrastructure at scale

  • SOC 2 Type II and ISO 27001 certified
  • HIPAA and GDPR compliance
  • EU data residency options
  • Zero retention for TTS and STT
bq07xjughmr-Data protection.svg

Frequently asked questions

Who developed Wan?

Wan is a multimodal AI video generation model developed by Alibaba's Tongyi Lab.

What types of inputs does Wan support?

Text prompts, images, video references, and audio input, supporting rich, multi-source generation.

How long are videos generated by Wan?

Wan supports clips ranging from 5 to 15 seconds, with scene and character consistency across shots.

Does Wan generate synchronized audio?

Yes. Newer versions include native audio‑visual sync for voices, sound effects, and music.