Grok Imagine Image is xAI's image generation model powered by Aurora, an autoregressive mixture-of-experts network trained on interleaved text and image data. Aurora excels at photorealistic rendering with precise adherence to text instructions, and supports a wide range of visual styles — from ultra-realistic photography and cinematic scenes to anime, oil paintings, pencil sketches, and beyond. The model is particularly strong at rendering accurate real-world details, text within images, and realistic human portraits, areas where many competing models still struggle.

The model supports both text-to-image generation and image editing. In generation mode, provide a text prompt describing your desired image and select from a variety of aspect ratios (from 2:1 landscape to 1:2 portrait, plus an auto option). In editing mode, supply a source image alongside a prompt describing the desired changes — the model natively understands image content and applies modifications while preserving the overall composition. Aspect ratio settings are ignored when editing, as the output matches the source image dimensions. You can generate up to 10 images per request, making it easy to explore multiple creative directions simultaneously.

Grok Imagine Image produces images at 1K resolution and ranks among the top text-to-image models on Arena.ai's Image Arena benchmarks, sitting on the Pareto frontier for cost-to-quality among models in its price range. Its multimodal foundation allows bidirectional processing between text and images, enabling flexible workflows from concept art and product visualization to logo design and style transfer. Prompts can be up to 10,000 characters, giving you room for highly detailed scene descriptions and creative direction.

Grok Imagine Image

More models from xAI

Grok Imagine Video 1.5

Grok Imagine Image Quality

xAI Grok TTS

Grok Edit Video

Grok Extend Video

Grok Imagine Video

Related blog posts

Why Your AI Game Generator Keeps Letting You Down (And How We Fixed It)

Introducing Grok Imagine Image: Text-to-Image Generation (txt2img): Creates images from written descriptions, turning concepts into visuals.