The Embeddings endpoint converts text into dense numerical vectors that capture semantic meaning. You can use these vectors to power semantic search, build retrieval-augmented generation (RAG) pipelines, cluster similar content, or compute similarity scores between pieces of text. NexLLM’s embeddings endpoint is fully OpenAI-compatible, so any existing embeddings workflow transfers without code changes.
Endpoint
POST https://www.nexllm.ai/v1/embeddings
Request Parameters
The embedding model to use. Example: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.
The text to embed. Pass a single string for one embedding, or an array of strings to embed multiple texts in a single request. Batching multiple inputs is more efficient than making separate requests.
Response Fields
An array of embedding objects, one for each input string. The order matches the order of the inputs you provided.
The embedding vector for this input — an array of floating-point numbers. The dimensionality depends on the model you chose.
The zero-based index of this embedding in the data array, corresponding to the position of the input string you submitted.
The total number of tokens processed across all inputs in the request.
Code Example
from openai import OpenAI
client = OpenAI(
api_key="sk-xxxxxxxxxxxxxxxx",
base_url="https://www.nexllm.ai/v1"
)
response = client.embeddings.create(
model="text-embedding-3-small",
input="The quick brown fox jumps over the lazy dog"
)
print(response.data[0].embedding)
To embed multiple strings at once, pass a list as input:
response = client.embeddings.create(
model="text-embedding-3-small",
input=[
"The quick brown fox jumps over the lazy dog",
"Pack my box with five dozen liquor jugs"
]
)
for item in response.data:
print(f"Index {item.index}: {len(item.embedding)}-dimension vector")
When building a RAG pipeline, embed your document chunks once at ingestion time and store the vectors in a vector database (e.g. Pinecone, Weaviate, or pgvector). At query time, embed the user’s question and perform a nearest-neighbour search to retrieve relevant chunks before calling the Chat Completions endpoint.