Skip to main content
The Embeddings endpoint converts text into dense numerical vectors that capture semantic meaning. You can use these vectors to power semantic search, build retrieval-augmented generation (RAG) pipelines, cluster similar content, or compute similarity scores between pieces of text. NexLLM’s embeddings endpoint is fully OpenAI-compatible, so any existing embeddings workflow transfers without code changes.

Endpoint

POST https://www.nexllm.ai/v1/embeddings

Request Parameters

model
string
required
The embedding model to use. Example: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.
input
string | array
required
The text to embed. Pass a single string for one embedding, or an array of strings to embed multiple texts in a single request. Batching multiple inputs is more efficient than making separate requests.

Response Fields

data
array
An array of embedding objects, one for each input string. The order matches the order of the inputs you provided.
data[].embedding
array of floats
The embedding vector for this input — an array of floating-point numbers. The dimensionality depends on the model you chose.
data[].index
integer
The zero-based index of this embedding in the data array, corresponding to the position of the input string you submitted.
usage.prompt_tokens
integer
The total number of tokens processed across all inputs in the request.

Code Example

from openai import OpenAI

client = OpenAI(
    api_key="sk-xxxxxxxxxxxxxxxx",
    base_url="https://www.nexllm.ai/v1"
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="The quick brown fox jumps over the lazy dog"
)

print(response.data[0].embedding)
To embed multiple strings at once, pass a list as input:
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=[
        "The quick brown fox jumps over the lazy dog",
        "Pack my box with five dozen liquor jugs"
    ]
)

for item in response.data:
    print(f"Index {item.index}: {len(item.embedding)}-dimension vector")
When building a RAG pipeline, embed your document chunks once at ingestion time and store the vectors in a vector database (e.g. Pinecone, Weaviate, or pgvector). At query time, embed the user’s question and perform a nearest-neighbour search to retrieve relevant chunks before calling the Chat Completions endpoint.