Installation
Set up Wan 2.2 locally and run S2V.
Clone
git clone https://github.com/Wan-Video/Wan2.2.git cd Wan2.2
Install
# Ensure torch >= 2.4.0 # If the installation of `flash_attn` fails, try installing the other packages first and install `flash_attn` last pip install -r requirements.txt
Models
- T2V-A14B — Text-to-Video MoE, 480P & 720P
- I2V-A14B — Image-to-Video MoE, 480P & 720P
- TI2V-5B — High-compression VAE, T2V+I2V, 720P
- S2V-14B — Speech-to-Video, 480P & 720P
Note: TI2V-5B supports 720P generation at 24 FPS.
huggingface-cli
pip install "huggingface_hub[cli]" huggingface-cli download Wan-AI/Wan2.2-T2V-A14B --local-dir ./Wan2.2-T2V-A14B
modelscope-cli
pip install modelscope modelscope download Wan-AI/Wan2.2-T2V-A14B --local_dir ./Wan2.2-T2V-A14B
Run S2V
python generate.py --task s2v-14B --size 1024*704 --ckpt_dir ./Wan2.2-S2V-14B/ --offload_model True --convert_model_dtype --prompt "a person is speaking" --image "examples/i2v_input.JPG" --audio "examples/talk.wav"
torchrun --nproc_per_node=8 generate.py --task s2v-14B --size 1024*704 --ckpt_dir ./Wan2.2-S2V-14B/ --dit_fsdp --t5_fsdp --ulysses_size 8 --prompt "a person is speaking" --image "examples/i2v_input.JPG" --audio "examples/talk.wav"