Model Pricing: Token Costs, Caching, and Group Ratios

Understanding how costs are calculated helps you budget accurately and avoid unexpected charges. NexLLM bills based on the tokens you send and receive, with additional line items for caching, audio input, and the group ratio assigned to your API key. This page explains how to find a model’s pricing, what each pricing term means, and how to calculate your estimated cost for Claude, OpenAI, and Gemini models.

Finding a Model

The Model Square page is your starting point for exploring available models and their pricing. Use the search bar to find models by entering keywords such as:

Model name (e.g., claude-3-5-sonnet)
Provider name (e.g., Anthropic, Google)
Endpoint name (e.g., chat/completions)
Model tag (e.g., 128k, vision)

The left-side filter panel lets you narrow results further using the following filters:

Filter	Description
Groups	Show only models available in a specific channel group, such as `default`, `claude`, `gemini`, or others.
Vendors	Filter by provider or vendor — for example, Anthropic, OpenAI, AWS, or Google.
Model Tags	Filter by capability or context size, such as `1M`, `128k`, `198k`, or other configured tags.
Pricing Type	Filter by billing method. Token-based models are billed by token usage; Per Request models are billed per API call.
Endpoint Type	Filter by supported endpoint, such as chat completions, text completions, embeddings, image generation, audio, rerank, responses, realtime, and others.

Once you find a model, click Details to view its full pricing breakdown, context window, maximum output, supported modalities, API compatibility, and other metadata.

Understanding the Pricing Display

Model pricing is shown per 1 million tokens. Depending on the model and provider, you may see any combination of the following pricing items:

Pricing Item	Description
INPUT	Cost for tokens sent to the model — user messages, system prompts, and all other prompt content.
OUTPUT	Cost for tokens generated by the model in its response.
CACHE READ	Cost for tokens read from cache, when prompt caching or context caching is supported.
CACHE WRITE	Cost for tokens written into cache. This rate may vary depending on the cache duration.
AUDIO IN	Cost for audio input tokens, when the model supports audio or multimodal input billing.
GROUP RATIO	A multiplier applied to the total model cost based on your API key’s channel group.

General Price Calculation

For standard token-based usage, you can estimate your cost using the following formula:

Total Cost =
  (Input Tokens / 1,000,000 × Input Price)
  + (Output Tokens / 1,000,000 × Output Price)
  + (Cache Read Tokens / 1,000,000 × Cache Read Price)
  + (Cache Write Tokens / 1,000,000 × Cache Write Price)

Final Cost = Total Cost × Group Ratio

Worked Example

Assume a model with the following base pricing is used for a single API request with these token counts:

Usage Type	Base Price	Token Usage	Cost
Input	$3.00 / 1M	100,000	$0.30
Output	$15.00 / 1M	20,000	$0.30
Cache Read	$0.30 / 1M	500,000	$0.15
Cache Write	$3.75 / 1M	500,000	$1.875
Total	—	—	$2.625

If your API key belongs to a group with a 1.2x ratio, the final charge is:

$2.625 × 1.2 = $3.15

The final estimated charge for this request is $3.15.

Claude Model Pricing

Claude models use separate rates for input, output, and two types of cache writes, each priced relative to the base input token price. Anthropic defines the following cache multipliers:

Cache Type	Pricing Rule
5-minute cache write	`1.25×` the base input token price
1-hour cache write	`2×` the base input token price
Cache read / cache hit	`0.1×` the base input token price

A cache write is charged when content is stored in the cache. A cache read is charged when a subsequent request reuses that cached content. These rules are provider-specific and only apply when prompt caching is enabled and used. For Claude models, the general cost formula is:

Claude Cost =
  (Input Tokens × Input Price)
  + (Output Tokens × Output Price)
  + (5-Minute Cache Write Tokens × Input Price × 1.25)
  + (1-Hour Cache Write Tokens × Input Price × 2)
  + (Cache Read Tokens × Input Price × 0.1)

After calculating the model cost, multiply by your group ratio to get the final charge.

OpenAI Model Pricing

OpenAI-compatible models separate pricing into input tokens, cached input tokens, and output tokens. If the provider returns cached input, the platform applies the cache read price for that model.

OpenAI Cost =
  (Input Tokens × Input Price)
  + (Cached Input Tokens × Cached Input Price)
  + (Output Tokens × Output Price)

OpenAI pricing does not use the 5-minute and 1-hour cache write model that Claude uses. When cached input is returned by OpenAI, the cache read rate applies automatically.

After calculating the model cost, multiply by your group ratio to get the final charge.

Gemini Model Pricing

Gemini models can include input tokens, output tokens, cached input, audio input, and context cache storage charges. Several Gemini models also apply tiered pricing based on prompt size:

Prompt Size	Pricing Behavior
`≤ 200k tokens`	Lower input, output, and cache rates apply.
`> 200k tokens`	Higher long-context rates apply.

When a request exceeds the 200k-token threshold and the selected model uses tiered pricing, the higher tier is applied according to the model’s configured pricing. For Gemini models, the general cost formula is:

Gemini Cost =
  (Input Tokens × Applicable Input Price)
  + (Output Tokens × Applicable Output Price)
  + (Cached Input Tokens × Applicable Cache Price)
  + (Audio Input Tokens × Audio Input Price, if applicable)
  + (Context Cache Storage Charge, if applicable)

After calculating the model cost, multiply by your group ratio to get the final charge.

Group Ratio

Each channel group can carry its own ratio that scales the base model cost up or down. The Pricing by Group section on a model’s detail page shows the effective price for each group.

Always confirm that you’re viewing pricing for your correct channel group before making API calls. If your API key belongs to a group with a higher ratio, your effective price will be higher than the base rate shown.

Important Notes

Pricing varies by provider, model, channel group, enabled features, and token type. Before using a model in production, review its detail page and confirm the pricing that applies to your assigned group.For the most accurate cost estimate, always verify:

The selected model and its provider
Your API key’s assigned channel group
Input and output token usage
Cache write and cache read usage
Audio or multimodal token usage
Whether your request falls under the ≤200k or >200k pricing tier
Any provider-specific cache rules or group ratio adjustments

​Finding a Model

​Understanding the Pricing Display

​General Price Calculation

​Worked Example

​Claude Model Pricing

​OpenAI Model Pricing

​Gemini Model Pricing

​Group Ratio

​Important Notes

Finding a Model

Understanding the Pricing Display

General Price Calculation

Worked Example

Claude Model Pricing

OpenAI Model Pricing

Gemini Model Pricing

Group Ratio

Important Notes