Embeddings API: Convert Text to Vectors with NexLLM

The Embeddings endpoint converts text into dense numerical vectors that capture semantic meaning. You can use these vectors to power semantic search, build retrieval-augmented generation (RAG) pipelines, cluster similar content, or compute similarity scores between pieces of text. NexLLM’s embeddings endpoint is fully OpenAI-compatible, so any existing embeddings workflow transfers without code changes.

Endpoint

POST https://www.nexllm.ai/v1/embeddings

Request Parameters

model

string

required

The embedding model to use. Example: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

input

string | array

required

The text to embed. Pass a single string for one embedding, or an array of strings to embed multiple texts in a single request. Batching multiple inputs is more efficient than making separate requests.

Response Fields

data

array

An array of embedding objects, one for each input string. The order matches the order of the inputs you provided.

data[].embedding

array of floats

The embedding vector for this input — an array of floating-point numbers. The dimensionality depends on the model you chose.

data[].index

integer

The zero-based index of this embedding in the data array, corresponding to the position of the input string you submitted.

usage.prompt_tokens

integer

The total number of tokens processed across all inputs in the request.

Code Example

from openai import OpenAI

client = OpenAI(
    api_key="sk-xxxxxxxxxxxxxxxx",
    base_url="https://www.nexllm.ai/v1"
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="The quick brown fox jumps over the lazy dog"
)

print(response.data[0].embedding)

To embed multiple strings at once, pass a list as input:

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=[
        "The quick brown fox jumps over the lazy dog",
        "Pack my box with five dozen liquor jugs"
    ]
)

for item in response.data:
    print(f"Index {item.index}: {len(item.embedding)}-dimension vector")

When building a RAG pipeline, embed your document chunks once at ingestion time and store the vectors in a vector database (e.g. Pinecone, Weaviate, or pgvector). At query time, embed the user’s question and perform a nearest-neighbour search to retrieve relevant chunks before calling the Chat Completions endpoint.

​Endpoint

​Request Parameters

​Response Fields

​Code Example

Endpoint

Request Parameters

Response Fields

Code Example