Video
LTX-2.3 Pro Audio to Video
Use it ↗Generate video driven by an audio clip. Voice cadence controls pacing, musical energy shapes motion. Up to 20 seconds, 1080p, precise audio-visual sync.
Turn any audio clip into a synchronized video. Provide a 2 to 20 second audio file with an optional first frame image or text prompt. LTX-2.3 Pro uses a joint audio-video diffusion transformer that reads the audio's temporal structure to control motion timing, pacing, and emphasis. The result is a video where visuals move with the sound, not just alongside it. Produces up to 20 seconds at 1080p. Guidance scale controls prompt adherence. Built for podcasts, voice-driven narratives, avatar animation, and audio-led creative production.