Introducing Grok Imagine Image Pro: Transform text descriptions into high-quality visuals (txt2img)
Grok Imagine Image Pro turns a text description into a production-ready visual — no photoshoot required.

What is Grok Imagine Image Pro?
Grok Imagine Image Pro turns a text description into a production-ready visual — no photoshoot required. Built for brands and creators who move fast, it generates high-quality, commercially-ready images that fit straight into marketing campaigns, ads, and social content without the usual back-and-forth.
Built on xAI's Aurora architecture, the model is unusually good at following detailed instructions. Describe exactly what you want, and it delivers — precise composition, photorealistic detail, and consistent style. It also works on what you already have: feed it an existing image and a text prompt, and you can edit, enhance, or fully transform it on the spot. From clean product shots to elaborate artistic scenes, it handles the range without losing quality.
Key Capabilities
- Generate high-quality, commercially-ready images from text descriptions
- Edit, enhance, and transform existing photos using text prompts
- Produce a wide range of visual styles, from photorealistic to artistic
- Render fine detail and lifelike imagery with consistent quality
- Translate complex, specific creative briefs into precise visuals
Examples
Here's what Grok Imagine Image Pro produces when given detailed, intentional prompts.

{
"meta": {
"style": "8k raw photo, hyper-detailed, photorealistic masterpiece, National Geographic aesthetic",
"creativity_temp": 1.8
},
"subject": {
"identity": "Realistic interpretation of Clash Royale Hog Rider and mount. Rider: muscular dark-skinned male, defined vascularity, signature black mohawk, gold nose ring. Hog: massive boar, pinkish-grey skin, prominent ivory tusks.",
"pose_action": "Mid-gallop across shallow water. Hog's front hooves smashing into the brine, generating a crown-splash of saline droplets. Rider leaning forward, gripping leather reins, golden warhammer raised.",
"material_detail": "Rider Skin: PBR subsurface scattering, visible sweat pores, glistening moisture, hyper-realistic melanin texture. Hog Fur: Coarse, stiff bristles, wet and matted near legs, distinct follicle density. Leather: Worn saddle texture, cracked edges. Metal: Hammer gold with micro-scratches and oxidation."
},
"environment": {
"location": "Salar de Uyuni, Bolivia. Infinite horizon where sky meets earth.",
"background_elements": "Seamless mirror reflection of the azure sky and cumulus clouds on the ground. Hexagonal salt crust patterns visible through translucent shallow water.",
"atmosphere": "High-altitude clarity, thin air, zero haze. Water surface tension breaking at impact points. Crystalline salt particles suspended in splash droplets."
},
"lighting": {
"source_angle": "High-noon zenith sun, hard directional light, 90-degree angle.",
"kelvin_quality": "5800K pure daylight, blinding white albedo from salt reflection.",
"visual_effects": "Ray-traced reflections, harsh contact shadows, specular highlights on wet skin and water ripples, slight chromatic aberration on water droplets."
},
"camera_specs": {
"gear_lens": "Phase One XF IQ4 150MP, 28mm wide-angle prime lens.",
"aperture_iso": "f/11 for deep depth of field, ISO 50, 1/4000s shutter speed to freeze water.",
"film_finish": "Kodak Ektar 100 simulation, high contrast, saturated blues and golds, ultra-sharp focus."
}
}
Ultra-realistic cinematic scene inside a movie studio, filmed as a selfie from the girl’s own cellphone front-facing camera perspective. The girl is clearly holding the phone at arm’s length, taking a selfie, but the cellphone itself is NOT visible anywhere in the frame, since the camera viewpoint is the phone’s front camera. The framing, angle, and slight handheld distortion must clearly communicate that this is a selfie shot.
The girl is perfectly integrated into the scene and is still carrying her bag naturally.
Behind her, a professional film crew is visible with studio lights, cameras, tripods, and boom microphones. Further in the background, behind the film crew, is Guts from the live-action anime Berserk, wearing his iconic armor and holding his massive sword. Guts is positioned directly behind the girl and is looking straight into the camera.
In the far background, surrounded by dark smoke, shadows, and dramatic supernatural lighting, are the five demonic beings known as the God Hand.
Ultra-realistic style, cinematic lighting, shallow depth of field, high dynamic range, realistic skin textures, subtle film grain, and natural motion blur consistent with a handheld selfie shot.
{
"meta": {
"style": "Unreal Engine 5.2 render mixed with 8k raw photography, hyper-realistic background vs stylized character, cinematic composition",
"creativity_temp": 1.8
},
"subject": {
"identity": "Stylized Fortnite-aesthetic tactical aviator avatar, vibrant saturated color palette",
"pose_action": "Standing confident amidst wreckage, inspecting a rusted turbine engine",
"material_detail": "Smooth diffuse shading on character skin, clean ballistic nylon textures, matte polymer armor plates, emissive neon teal LED accents, sharp cel-shaded normal maps contrasting against realistic grime"
},
"environment": {
"location": "Mojave Desert Aircraft Boneyard, Davis-Monthan AFB inspiration",
"background_elements": "Decommissioned Boeing 747 and B-52 fuselages, oxidized bare aluminum, flaking chemically-weathered paint, sun-bleached nose art, cracked caliche clay ground, dry tumbleweeds",
"atmosphere": "Intense heat haze distortion (shimmering mirage effect), suspended silica dust particles, dry arid air density, atmospheric perspective fading into cyan horizon"
},
"lighting": {
"source_angle": "High-noon zenith sunlight, direct incidence",
"kelvin_quality": "6500K harsh daylight, blinding white specular highlights on metal",
"visual_effects": "Ray-traced hard shadows, Global Illumination (Lumen) bounce light from sand to metal, lens flare anamorphic streaks, ambient occlusion in mechanical crevices"
},
"camera_specs": {
"gear_lens": "Sony A7R IV, 85mm f/1.4 GM lens for subject separation",
"aperture_iso": "f/2.8, ISO 50, 1/8000s shutter speed",
"film_finish": "Kodak Ektar 100 simulation, high contrast, vibrant saturation, sharp digital clarity, slight chromatic aberration at edges"
}
}
{
"input_image": "User-provided product photo",
"resolution": "8K UHD",
"image_style": "hyper-realistic commercial product photography",
"global_settings": {
"quality": "Ultra-high detail, sharp focus, premium advertising quality",
"lighting": "Controlled studio lighting emphasizing internal textures and contents",
"camera": "High-speed photography look, shallow to medium depth of field",
"motion": "Frozen mid-air action, cinematic energy",
"post_processing": "Balanced contrast, natural saturation, clean finish"
},
"scene": {
"main_subject": {
"description": "User-provided product with realistically opened packaging",
"position": "Centered, hero shot",
"state": "Packaging opening according to its real-world mechanism, revealing contents dynamically",
"integrity": "Packaging structure and branding remain intact and readable"
},
"opening_effects": {
"style": "Realistic commercial opening action",
"mechanism": "Opening method strictly matches the product type (pull tab, tear seal, wrapper peel, lid removal, cap twist, seal break)",
"elements": "Contents, ingredients, fragments, liquids or particles emerging naturally from inside",
"motion": "Physically accurate movement, frozen mid-action"
}
},
"background": {
"style": "Clean studio background or smooth gradient",
"color": "High contrast with both packaging and contents",
"depth": "Subtle separation to highlight internal elements"
},
"rules": {
"content_focus": "Internal product is visually dominant",
"realism": "No unrealistic cuts, breaks or openings; no destructive packaging behavior",
"branding": "Logos, labels and package details remain sharp and legible",
"no_artifacts": "No AI distortions, warping or unnatural shapes"
},
"goal": {
"mood": "Energetic, appetizing, premium",
"usage": "Advertising, packaging reveal, product launch",
"priority": "Contents first, packaging enhances and frames the reveal"
}
}
Dark Background
{
"meta": {
"style": "8k raw photo, avant-garde fashion, hyper-detailed, volcanic aesthetic, cinematic composition",
"creativity_temp": 1.8
},
"subject": {
"identity": "High-fashion model, angular bone structure, skin texture detailed with thermal perspiration and microscopic volcanic soot particles.",
"pose_action": "Standing rigid against wind, arms extended to display billowing sleeves, dynamic fabric simulation.",
"material_detail": "Sheer organza blouse, oversized ruffled architecture. PBR properties: high transmission, subsurface scattering turning fabric glowing ember-orange from beneath. Micro-details: intricate silk weave, tiny singe marks where sparks land, translucent layering revealing silhouette."
},
"environment": {
"location": "Cooling basalt ridge, geometric hexagonal rock formations, jagged obsidian ground.",
"background_elements": "Fissures of molten lava, distant pyroclastic flow.",
"atmosphere": "Heavy volumetric ash clouds, grey particulate suspension, heat haze shimmering (refraction index 1.0003), choked sky, oppressive density."
},
"lighting": {
"source_angle": "Strong up-lighting from ground fissures (subterranean glow), soft diffuse top-down light from overcast sky.",
"kelvin_quality": "Contrast between 1200K (magma red/orange) and 7500K (ash grey).",
"visual_effects": "Lumen global illumination, ray-traced translucency through fabric, ember sparks acting as micro-point lights, bloom on lava veins."
},
"camera_specs": {
"gear_lens": "Phase One XF IQ4 150MP, Schneider Kreuznach 110mm LS f/2.8 Blue Ring.",
"aperture_iso": "f/1.8 for shallow depth of field, ISO 50, 1/4000s shutter to freeze fabric ripple.",
"film_finish": "Kodak Ektar 100 emulation, high dynamic range, crushed shadows, vibrant thermal highlights, chromatic aberration on edges."
}
}
A highly detailed digital illustration with a hyper-realistic, scientific visualization style, combining research-grade anatomical accuracy with advanced military targeting system aesthetics. Clean, high-contrast rendering with precise line work, layered data overlays, glowing vector paths, and sensor-style annotations. Dark, minimal background to enhance readability and focus. Balanced lighting with controlled glow intensity, no artistic distortion, no exaggeration. The composition follows an analytical, documentary tone, presenting the subject as a system under observation rather than a narrative scene, maintaining clinical respect and technical clarity. a futuristic motorcycle
A beautiful black woman is dressed up in urban clothes and make up. Full body shot
Generate a cinematic image from the scene in the Bottom-left. The movie genre is Action, Thriller. , and the Visual style is Futuristic, Tron movie style, Realistic. . The scene reflects the mood, tension, and atmosphere typical of the MOVIE GENRE. Visual language inspired by the VISUAL STYLE, with detailed environments, realistic materials, and strong depth. High detail, sharp focus, cinematic framing, film still quality.
Make a full body turnaround sheet of this exact character. Four full-body poses on a pure white background: Front view, left profile, back view, and right profile. Evenly spaced in a horizontal row and consistent style.
{
"meta": {
"style": "8k raw photo, hyper-realistic, cinematic sci-fi, unreal engine 5 render, masterpiece",
"creativity_temp": 1.8
},
"subject": {
"identity": "Prehistoric primate, Australopithecus physique, fur matted with jagged rime ice and hoarfrost crystals",
"pose_action": "Finger making contact with the cold silicon surface, muscles tensed, breath visible as vapor",
"material_detail": "Monolith: Translucent sapphire-blue silicon wafer, nanometer-scale gold-etched circuitry visible deep within, refractive index 1.77, smooth glass-like surface with sub-surface scattering at touch point"
},
"environment": {
"location": "Frozen glacial tundra, jagged permafrost terrain, pool of melted water at monolith base",
"background_elements": "Holographic galaxy projection expanding from contact point, sky fracturing into geometric Voronoi shards, digital glitch artifacts blending with clouds",
"atmosphere": "Dense freezing fog, suspended ice particles, steam rising from the thermal reaction, volumetric density"
},
"lighting": {
"source_angle": "Low angle upward cast from the monolith's internal glow, ambient glacial twilight from above",
"kelvin_quality": "Internal Pulse: Electric Cyan (9000K), Ambient: Deep Arctic Blue (7500K), Hologram: Spectrum White",
"visual_effects": "Ray-traced caustics on wet ice, anamorphic lens flares, chromatic aberration on sky fractures, global illumination"
},
"camera_specs": {
"gear_lens": "Phase One XF IQ4 150MP, Rodenstock 28mm wide-angle lens",
"aperture_iso": "f/8, ISO 50, 1/250s shutter",
"film_finish": "Kodak Ektar 100 simulation, bleach bypass, high contrast, sharp texture detail, 8k resolution"
}
}
A cinematic portrait of an elderly Japanese fisherman standing on a quiet harbor at dawn, his weathered face marked by deep wrinkles and salt-stained skin. Soft golden morning light illuminates his features from the side, casting long shadows and highlighting the texture of his beard and worn jacket. The background shows calm water, wooden boats gently floating, and mist rising from the sea. Shot with shallow depth of field, ultra-realistic, 85mm lens look, natural color grading, calm and contemplative mood.
A dramatic close-up portrait of a young woman with freckles and emerald-green eyes, standing in heavy rain at night. Neon city lights reflect off the wet pavement behind her, creating vibrant bokeh in shades of cyan and magenta. Raindrops cling to her hair and eyelashes, illuminated by a strong backlight. High contrast lighting, cinematic cyberpunk atmosphere, ultra-detailed skin texture, emotional and intense expression.
An ultra-detailed macro photograph of a jumping spider’s face, capturing its multiple reflective eyes and fine hair texture. Natural daylight softly illuminates the subject, revealing vivid colors and microscopic details. Shallow depth of field with a smooth green background blur. Hyper-realistic macro photography, scientific yet visually striking.
An extreme macro photograph of a precision mechanical watch movement, showing interlocking gears, micro-screws, and polished metal components. Every surface reveals fine machining marks and subtle reflections. Soft diffused studio lighting highlights the metallic textures and depth of field is razor-thin, with only a few gears in sharp focus. Ultra-high detail, photorealistic macro photography, technical and elegant mood.
A futuristic armored soldier sprinting across a high-tech command center while holographic interfaces flicker around him. The camera is positioned low and slightly tilted, creating a dynamic diagonal composition. Motion blur trails behind his moving limbs, while his helmet and weapon remain sharply in focus. Blue and teal lights streak across the scene, with sparks and floating particles enhancing the sense of speed. Cinematic action shot, AAA video game key art, intense and tactical atmosphere.
A woman in a metallic fashion dress captured mid-spin in a minimalist studio. The camera is angled slightly above and off-center, emphasizing the flowing movement of the fabric. The dress creates sweeping motion lines as it catches the light, while her body remains sharply defined. Strong directional lighting sculpts the form, with soft shadows trailing behind the movement. High-fashion editorial photography, energetic and expressive mood.
A dark fantasy warrior mid-swing with a massive sword inside a ruined cathedral. The camera captures the scene from a low-angle perspective, close to the ground, emphasizing power and momentum. The warrior’s cape and hair flow through the air, with debris and dust kicked up around his feet. Moonlight cuts through broken arches, creating dramatic highlights and deep shadows. Dynamic action pose, epic fantasy game art, violent and heroic energy.
Create a full character turnaround sheet featuring the front, side, and back views of this character in a neutral A-pose (arms slightly away from the body, palms facing inward), standing. Arrange the three views horizontally in a single row, each view centered inside an equally sized frame, with identical spacing between views to allow precise, uniform cropping. Each view must be clearly separated.
Ensure the low poly style, proportions, colors, and surface details remain fully consistent across all angles. Maintain the same visual language, simplified geometry, and stylized realism seen in the reference.
Display the turnaround on a clean, neutral grey background with soft, even studio lighting to clearly show form, volume, and silhouette. Lighting and camera distance must remain identical for all views.
Remove all weapons or items from the character’s hands. No props. Focus on clear anatomy, strong silhouettes, readable shapes, and game-ready presentation suitable for modeling, rigging, and animation.
A high-bitrate sports broadcast still from an ultra-wide F1 on-board T-Cam, capturing a car violently attacking the Sainte-Dévote corner at the Monaco Grand Prix. The scene is dominated by extreme motion blur on the track surface, yellow-and-red curbing, and Armco barriers due to a slow shutter speed (1/60s look). Heat haze ripples visibly from the exhaust, carbon-fiber mirrors vibrate, and the front brake discs glow bright red. The steering wheel display clearly reads 'GEAR 4' and 'RPM 12000'. Harsh midday Mediterranean sunlight creates high-contrast shadows, glinting off metallic paint and showing realistic reflections on the driver's helmet visor. The background is a blur of the harbor and luxury yachts. Tire marbles and skid marks cover the track.
A sleek cybernetic panther stalks across a rain-soaked neon alley at midnight. Its matte black, segmented metal plating is accented by glowing blue circuit lines that pulse rhythmically along the muscles.
Droplets of rain glisten on the panther’s body, catching the shifting hues from overhead holographic billboards. Its eyes shine with a sharp, intelligent turquoise glow and low, electrical vapor drifts from its nostrils as it exhales.
The alley is lined with wet concrete, reflecting vibrant signs in pink, cyan, and green. Tense, cinematic composition with dramatic, angled lighting casts elongated shadows and intricate reflections beneath padded, clawed paws. No text appears anywhere in the image.
A voxel-style mech charging forward with heavy mechanical steps, captured from a low-angle perspective. Each movement displaces voxel debris and dust. Directional lighting creates dramatic contrast across the blocky forms. Modern voxel game aesthetic, powerful and kinetic.About the Provider
Grok Imagine Image Pro is developed by xAI, an AI company founded by Elon Musk. The model is part of xAI's broader effort to make powerful generative tools accessible to the creative community.
Related models
Grok Imagine Video
Generate and edit videos using xAI's Grok model — supports text-to-video, image-to-video (up to 15s), and video editing (up to 8.7s) with 480p/720p resolution and flexible aspect ratios
Grok Imagine Image
Generate and edit images using xAI's Grok model, powered by Aurora — supports text-to-image generation and image editing with multiple aspect ratios and batch output up to 10 images
