Wan S2V

Wan S2V FAQs

Do I need prompts?

Prompts are optional. Audio and the reference image are sufficient. Prompts add scene intent.

How is duration decided?

By default, the audio length sets duration. Use --num_clip for shorter previews.

Can I control body pose?

Yes. Provide a pose video for pose-driven output while keeping audio sync.

What resolutions are common?

480P and 720P. Size is specified as area; aspect ratio follows the image.

What inputs are required?

One image and one audio file. Optional prompt and pose video.