Skip to main content
NexLLM’s audio endpoints give you access to speech-to-text transcription and text-to-speech synthesis through a single, OpenAI-compatible interface. Use the transcriptions endpoint to convert audio recordings into text, and the speech endpoint to turn written content into natural-sounding audio — all with the same API key you use for chat and embeddings.

Endpoints

EndpointPathDescription
Speech-to-TextPOST /v1/audio/transcriptionsTranscribe an audio file to text
Text-to-SpeechPOST /v1/audio/speechConvert text to spoken audio

Speech-to-Text: Transcriptions

Parameters

model
string
required
The transcription model to use. Use whisper-1 for Whisper-compatible transcription.
file
file
required
The audio file to transcribe. Accepted formats include mp3, mp4, mpeg, mpga, m4a, wav, and webm. The file must be under 25 MB.
language
string
The ISO-639-1 language code of the audio (e.g. en, es, fr). Providing this improves accuracy and speed. If omitted, the model detects the language automatically.
prompt
string
Optional text to guide the model’s transcription style or provide context about the audio content.

Text-to-Speech

Parameters

model
string
required
The text-to-speech model to use. Use tts-1 for standard quality or tts-1-hd for higher quality audio.
input
string
required
The text to convert to speech. Maximum length is 4,096 characters.
voice
string
required
The voice to use for synthesis. Available options: alloy, echo, fable, onyx, nova, shimmer.
response_format
string
The audio format of the output. Supported values: mp3, opus, aac, flac. Defaults to mp3.

Code Examples

from openai import OpenAI

client = OpenAI(
    api_key="sk-xxxxxxxxxxxxxxxx",
    base_url="https://www.nexllm.ai/v1"
)

with open("audio.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file
    )

print(transcript.text)
For long-form content like articles or podcasts, consider splitting the text into smaller segments before calling the speech endpoint. This lets you process segments in parallel and combine the output files, reducing overall latency.