Skip to main content
Version: Next

Custom API Quota

Custom API Quota enables dynamic quota allocation based on specific API call properties. Unlike traditional quotas that count the number of requests to an API provider, the fixed_window_custom_counter strategy allows you to define custom metrics for quota limits.

When using the fixed_window_custom_counter strategy, the quota budget is calculated based on a custom attribute instead of per request. The counter_value_path specifies the attribute in your API response that determines quota usage.

JSONPath is used to define where in the request the counter_value_path will extract the relevant quota value.

Scenarios

1. Quota Management Based on LLM Tokens

Rather than limiting the number of requests to a Large Language Model (LLM) API, you can define a quota based on the total number of tokens used. This enables more granular control over consumption, especially when requests vary significantly in token count. Lunar.dev can track specific fields in the API response and enforce limits based on their sum, ensuring you stay within your allocated token budget.

2. Quota Management Based on Google Maps Elements

For the Google Maps Distance Matrix API, you can manage quotas based on the number of "elements" in the request (combinations of origins and destinations). Lunar.dev extracts the number of origins and destinations from the request body and applies a custom quota that multiplies these values to represent the elements. This allows you to control spending based on the complexity of routing queries, preventing unexpected cost overruns from large matrix calculations.

3. Quota Management Based on Data Transfer Size

For APIs where data transfer volume is a significant cost factor, you can configure a custom quota that monitors the Content-Length header in the response. By setting limits on total bytes downloaded within a specific timeframe, you can optimize bandwidth usage and prevent exceeding data transfer limits. This is particularly useful for cloud storage or data analytics services where large data payloads are common.

Example: Custom quota based on tokens

/etc/Lunar-proxy/quotas/{fileName}.yaml
quotas:
- id: GPT-4o-mini-quota
filter:
url: api.openai.com/v1/chat/completions
headers:
- key: x-lunar-custom-model
value: gpt-4o-mini
strategy:
fixed_window_custom_counter:
max: 40000
interval: 1
interval_unit: minute
counter_value_path: |
$.request.headers["x-lunar-used-tokens"]

OpenAI Example:

In the following example, we allow 10,000 requests per minute. Using the JSONPath expression $.response.body.usage["total_tokens"], we extract 20 as the number of tokens used in a request. This enables us to calculate that there are 9,987 tokens remaining in our quota.

/etc/Lunar-proxy/quotas/example_qupta.yaml
quotas:
- id: GPT-4o-mini-quota
filter:
url: api.openai.com/v1/chat/completions
headers:
- key: x-lunar-custom-model
value: gpt-4o-mini
strategy:
fixed_window_custom_counter:
max: 10000
interval: 1
interval_unit: minute
counter_value_path: |
$.response.body.usage["total_tokens"]
OpenAI Reponse JSON

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1677858242,
"model": "gpt-4o-mini",
"usage": {
"prompt_tokens": 13,
"completion_tokens": 7,
"total_tokens": 20,
"completion_tokens_details": {
"reasoning_tokens": 0,
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
},
"choices": [
{
"message": {
"role": "assistant",
"content": "\n\nThis is a test!"
},
"logprobs": null,
"finish_reason": "stop",
"index": 0
}
]
}

With each response from OpenAI, the fixed_window_custom_counter calculates usage based on total_tokens and updates the remaining quota accordingly.