image
video
Omni Human 1.5
Omni Human 1.5 by ByteDance creates realistic talking avatars from a single image and audio with film-grade lip-sync. Pricing from $0.16/generation.
- First Frame
- animation
- audio
- lipsync
- speech
Omni Human 1.5 is ByteDance's closed-source image-to-video model (Oct 2025), specialized for digital human generation. It creates realistic talking avatars from a single image and audio track, with optional text prompts for scene control. The model prioritizes expressive lip-sync accuracy and character animation coherence over general video generation, making it effective for talking avatars and educational content. It analyzes audio to produce animations synchronized with speech rhythm. While unavailable on Scenario, the API is priced from $0.16 per generation.