Tips for using the Stable Audio 2.5 Audio-to-Audio API
Overview of Stable Audio 2.5 Audio-to-Audio usage and recommendations for starting parameter values and prompts.
Stable Audio 2 — Audio-to-Audio Endpoint
Overview
The Stable Audio 2 audio-to-audio endpoint allows you to transform existing audio using a combination of an input file and a descriptive text prompt. This enables creative workflows such as stylistic reinterpretation, remixing, and sound design while preserving elements of the original audio.
This endpoint is ideal when you want to guide the model using both source audio and detailed sonic characteristics.
Endpoint
POST /v2beta/audio/stable-audio-2/audio-to-audio
Submits an audio-to-audio generation request using an input audio file, a text prompt, and optional generation parameters such as strength.
How It Works
-
Provide Input Audio
Upload an existing audio file to serve as the structural and tonal foundation for generation. -
Write a Descriptive Prompt
Use descriptive language that specifies sonic qualities, instrumentation, mood, and texture rather than instructive phrasing.✅ Recommended:
-
“Distorted metal guitar, driving bassline, punchy drums, aggressive energy”
-
“Warm analog synth pads, soft vinyl crackle, mellow lo-fi beat”
-
“Cinematic strings, swelling brass, dramatic percussion”
❌ Avoid:
-
“Make this into a metal track”
-
“Turn this into lo-fi”
-
“Convert to orchestral style”
Descriptive prompts give the model clearer acoustic targets and produce more consistent, controllable results.
-
-
Adjust the
strengthParameterThe
strengthparameter controls how much the output diverges from the original input audio.-
A good starting point is
strength: 0.8 -
Increase slightly (e.g., 0.85–0.9) if the output is too similar to the original.
-
Decrease slightly (e.g., 0.6–0.75) if the output deviates too much from the input.
Finding the ideal result typically requires experimentation with both:
-
Prompt specificity
-
Strength value
Expect to iterate on both parameters to dial in the desired balance between preservation and transformation.
-
-
Receive Output Audio
The API returns the generated audio file or a reference to retrieve it once processing is complete.