Grok Imagine Video is xAI's generative video model that supports three creative workflows: text-to-video, image-to-video, and video-to-video editing. Built on the Grok Imagine engine, the model generates videos with cinema-grade physics simulation, creating realistic motion, object interactions, and environmental effects. It also produces natively synchronized audio — including background music, sound effects, ambient audio, and dialogue with accurate lip synchronization — making it one of the few video generation models that output complete audiovisual content in a single pass.
In text-to-video mode, describe your scene in a prompt and the model generates a video from 1 to 15 seconds in length at either 480p or 720p resolution, with flexible aspect ratios ranging from 16:9 widescreen to 9:16 vertical. In image-to-video mode, provide a source image as the first frame and a prompt describing the desired motion — the model animates the still image with realistic movement and dynamics. In video-to-video editing mode, supply a source video (automatically preprocessed to a maximum of 8.7 seconds at 720p) along with a prompt describing the desired changes — the model can restyle scenes, add or remove objects, and control motion while preserving the source video's duration and aspect ratio. You can generate up to 4 videos concurrently per request.
The model delivers strong instruction-following capabilities with support for prompts up to 10,000 characters, enabling detailed scene descriptions, camera direction, and motion choreography. Video duration, resolution, and aspect ratio are fully configurable in generation modes, while editing mode inherits these properties from the source video. Grok Imagine Video is well suited for rapid prototyping of cinematics, game trailers, social media content, animated concept art, and any workflow that benefits from fast, high-quality video generation with built-in audio.
More models from xAI
Image Image