Wan S2V FAQs
Do I need prompts?
Prompts are optional. Audio and the reference image are sufficient. Prompts add scene intent.
How is duration decided?
By default, the audio length sets duration. Use --num_clip for shorter previews.
Can I control body pose?
Yes. Provide a pose video for pose-driven output while keeping audio sync.
What resolutions are common?
480P and 720P. Size is specified as area; aspect ratio follows the image.
What inputs are required?
One image and one audio file. Optional prompt and pose video.