Full-length song generation (up to ~2 minutes) with structure, vocals, and optional image-to-music. MP3 or WAV output.