Skip to main content
Understanding how costs are calculated helps you budget accurately and avoid unexpected charges. NexLLM bills based on the tokens you send and receive, with additional line items for caching, audio input, and the group ratio assigned to your API key. This page explains how to find a model’s pricing, what each pricing term means, and how to calculate your estimated cost for Claude, OpenAI, and Gemini models.

Finding a Model

The Model Square page is your starting point for exploring available models and their pricing. Use the search bar to find models by entering keywords such as:
  • Model name (e.g., claude-3-5-sonnet)
  • Provider name (e.g., Anthropic, Google)
  • Endpoint name (e.g., chat/completions)
  • Model tag (e.g., 128k, vision)
The left-side filter panel lets you narrow results further using the following filters:
FilterDescription
GroupsShow only models available in a specific channel group, such as default, claude, gemini, or others.
VendorsFilter by provider or vendor — for example, Anthropic, OpenAI, AWS, or Google.
Model TagsFilter by capability or context size, such as 1M, 128k, 198k, or other configured tags.
Pricing TypeFilter by billing method. Token-based models are billed by token usage; Per Request models are billed per API call.
Endpoint TypeFilter by supported endpoint, such as chat completions, text completions, embeddings, image generation, audio, rerank, responses, realtime, and others.
Once you find a model, click Details to view its full pricing breakdown, context window, maximum output, supported modalities, API compatibility, and other metadata.

Understanding the Pricing Display

Model pricing is shown per 1 million tokens. Depending on the model and provider, you may see any combination of the following pricing items:
Pricing ItemDescription
INPUTCost for tokens sent to the model — user messages, system prompts, and all other prompt content.
OUTPUTCost for tokens generated by the model in its response.
CACHE READCost for tokens read from cache, when prompt caching or context caching is supported.
CACHE WRITECost for tokens written into cache. This rate may vary depending on the cache duration.
AUDIO INCost for audio input tokens, when the model supports audio or multimodal input billing.
GROUP RATIOA multiplier applied to the total model cost based on your API key’s channel group.

General Price Calculation

For standard token-based usage, you can estimate your cost using the following formula:
Total Cost =
  (Input Tokens / 1,000,000 × Input Price)
  + (Output Tokens / 1,000,000 × Output Price)
  + (Cache Read Tokens / 1,000,000 × Cache Read Price)
  + (Cache Write Tokens / 1,000,000 × Cache Write Price)

Final Cost = Total Cost × Group Ratio

Worked Example

Assume a model with the following base pricing is used for a single API request with these token counts:
Usage TypeBase PriceToken UsageCost
Input$3.00 / 1M100,000$0.30
Output$15.00 / 1M20,000$0.30
Cache Read$0.30 / 1M500,000$0.15
Cache Write$3.75 / 1M500,000$1.875
Total$2.625
If your API key belongs to a group with a 1.2x ratio, the final charge is:
$2.625 × 1.2 = $3.15
The final estimated charge for this request is $3.15.

Claude Model Pricing

Claude models use separate rates for input, output, and two types of cache writes, each priced relative to the base input token price. Anthropic defines the following cache multipliers:
Cache TypePricing Rule
5-minute cache write1.25× the base input token price
1-hour cache write the base input token price
Cache read / cache hit0.1× the base input token price
A cache write is charged when content is stored in the cache. A cache read is charged when a subsequent request reuses that cached content. These rules are provider-specific and only apply when prompt caching is enabled and used. For Claude models, the general cost formula is:
Claude Cost =
  (Input Tokens × Input Price)
  + (Output Tokens × Output Price)
  + (5-Minute Cache Write Tokens × Input Price × 1.25)
  + (1-Hour Cache Write Tokens × Input Price × 2)
  + (Cache Read Tokens × Input Price × 0.1)
After calculating the model cost, multiply by your group ratio to get the final charge.

OpenAI Model Pricing

OpenAI-compatible models separate pricing into input tokens, cached input tokens, and output tokens. If the provider returns cached input, the platform applies the cache read price for that model.
OpenAI Cost =
  (Input Tokens × Input Price)
  + (Cached Input Tokens × Cached Input Price)
  + (Output Tokens × Output Price)
OpenAI pricing does not use the 5-minute and 1-hour cache write model that Claude uses. When cached input is returned by OpenAI, the cache read rate applies automatically.
After calculating the model cost, multiply by your group ratio to get the final charge.

Gemini Model Pricing

Gemini models can include input tokens, output tokens, cached input, audio input, and context cache storage charges. Several Gemini models also apply tiered pricing based on prompt size:
Prompt SizePricing Behavior
≤ 200k tokensLower input, output, and cache rates apply.
> 200k tokensHigher long-context rates apply.
When a request exceeds the 200k-token threshold and the selected model uses tiered pricing, the higher tier is applied according to the model’s configured pricing. For Gemini models, the general cost formula is:
Gemini Cost =
  (Input Tokens × Applicable Input Price)
  + (Output Tokens × Applicable Output Price)
  + (Cached Input Tokens × Applicable Cache Price)
  + (Audio Input Tokens × Audio Input Price, if applicable)
  + (Context Cache Storage Charge, if applicable)
After calculating the model cost, multiply by your group ratio to get the final charge.

Group Ratio

Each channel group can carry its own ratio that scales the base model cost up or down. The Pricing by Group section on a model’s detail page shows the effective price for each group.
Always confirm that you’re viewing pricing for your correct channel group before making API calls. If your API key belongs to a group with a higher ratio, your effective price will be higher than the base rate shown.

Important Notes

Pricing varies by provider, model, channel group, enabled features, and token type. Before using a model in production, review its detail page and confirm the pricing that applies to your assigned group.For the most accurate cost estimate, always verify:
  • The selected model and its provider
  • Your API key’s assigned channel group
  • Input and output token usage
  • Cache write and cache read usage
  • Audio or multimodal token usage
  • Whether your request falls under the ≤200k or >200k pricing tier
  • Any provider-specific cache rules or group ratio adjustments