Transform videos with text prompts. Input a reference video and describe the desired output. Supports IC-LoRA control (canny, depth, pose, detailer) and optional first-frame conditioning.