https://www.nexllm.ai/v1 and swap in your NexLLM key.
Endpoint
Request Parameters
The ID of the model to use. NexLLM routes your request to the correct provider automatically. Examples:
gpt-4o, aws/claude-haiku-4-5, gemini-2.5-flash.An array of message objects that make up the conversation history. Each object must include a
role (system, user, or assistant) and a content string.The maximum number of tokens the model should generate in its response. Defaults to the model’s configured maximum if omitted.
When set to
true, the API streams the response as Server-Sent Events (SSE) instead of returning a single JSON response. Defaults to false.Controls the randomness of the output. Accepts a value between
0 and 2. Lower values (e.g. 0.2) produce more deterministic responses; higher values (e.g. 1.5) produce more varied output. Defaults to 1.Response Fields
A unique identifier for this completion request, useful for logging and debugging.
An array of generated response objects. Most requests return a single choice.
The text generated by the model for this choice.
The number of tokens consumed by the input messages.
The number of tokens generated in the model’s response.
Code Examples
Streaming Responses
Set
stream: true in your request body to receive the response as a stream of Server-Sent Events. Each event contains a partial delta of the generated text. This is useful for displaying output to users in real time as the model generates it. The OpenAI Python SDK handles SSE streaming automatically when you pass stream=True to the create call.