Gemini Omni Flash by Google turns text (or a first-frame image) into 720p video with generated audio baked in the same pass: dialogue, ambience, and effects arrive together, so you skip the separate voice-over and sound-design steps. Clips run 3 to 10 seconds in widescreen (16:9) or vertical (9:16). Feed a starter image for a controlled first frame or use pure text for total freedom. Best for social clips, ad shorts, game trailers, and any beat where audio sells the moment.