Endpoints
| Endpoint | Path | Description |
|---|---|---|
| Speech-to-Text | POST /v1/audio/transcriptions | Transcribe an audio file to text |
| Text-to-Speech | POST /v1/audio/speech | Convert text to spoken audio |
Speech-to-Text: Transcriptions
Parameters
The transcription model to use. Use
whisper-1 for Whisper-compatible transcription.The audio file to transcribe. Accepted formats include
mp3, mp4, mpeg, mpga, m4a, wav, and webm. The file must be under 25 MB.The ISO-639-1 language code of the audio (e.g.
en, es, fr). Providing this improves accuracy and speed. If omitted, the model detects the language automatically.Optional text to guide the model’s transcription style or provide context about the audio content.
Text-to-Speech
Parameters
The text-to-speech model to use. Use
tts-1 for standard quality or tts-1-hd for higher quality audio.The text to convert to speech. Maximum length is 4,096 characters.
The voice to use for synthesis. Available options:
alloy, echo, fable, onyx, nova, shimmer.The audio format of the output. Supported values:
mp3, opus, aac, flac. Defaults to mp3.