Skip to main content
Version: Next

API Quotas Configuration Template

Quota Configuration Template​

This template demonstrates how to configure quotas using Lunar’s system. It includes examples for fixed window, concurrent-based strategies, and the newly added fixed_window_llm strategy. Users can define internal limits for more specific cases as needed.

note

File names for Lunar's YAML configurations don't need to follow a specific convention (e.g., quota.yaml). As long as the file is placed in the correct folder, Lunar will automatically detect and apply it.

/etc/lunar-proxy/quotas/{fileName}.yaml
quotas:
- id: MyQuota # Unique identifier for the quota
filter: # Define filter conditions for this quota
url: api.website.com/* # URL pattern to apply the quota
headers: # Optional: Header-based filtering
- key: x-lunar-consumer-tag # Header to filter on
value: premium # Example value 1
- key: x-lunar-consumer-tag # Header to filter on
value: basic # Example value 2
strategy: # Quota strategy definition
fixed_window: # Fixed Window Strategy
static:
max: 1000 # Maximum requests allowed within the window
interval: 24 # Duration of the time window
interval_unit: hour # Unit of time for the interval (second/minute/hour/day/month)
group_by_header: x-lunar-consumer-tag # Optional: Group by header value
dynamic: # Dynamic Header-Based Configuration
remaining_header: X-RateLimit-Limit
reset_time_header: X-RateLimit-Reset
retry_after_header: Retry-After

- id: LLMQuotaOpenAI # Unique identifier for the quota
filter:
url: api.openai.com/* # URL pattern to apply the quota
strategy:
fixed_window_llm:
static:
max_requests: 100 # Maximum requests allowed within the window
max_tokens: 10000 # Maximum tokens allowed within the window
interval: 1
interval_unit: minute # Time window in minute

concurrent: # Concurrent Strategy
max_request_count: 50 # Maximum number of concurrent requests allowed
remaining_header: X-Concurrent-Remaining # Optional: Header to expose remaining concurrent requests

internal_limits: # Optional: Define nested child quotas within the main quota
- id: MyChildQuota # Unique identifier for the child quota
parent_id: MyQuota # Links the child quota to its parent
filter:
url: api.website.com/specific # URL pattern specific to the child quota
strategy:
fixed_window: # Strategy for managing child quota with a fixed window
static:
max: 500 # Maximum requests allowed within the child quota
interval: 1 # Window duration for the child quota
interval_unit: day # Unit of time for the interval (second/minute/hour/day/month)

- id: PremiumQuota
parent_id: MyQuota
filter:
headers:
- key: x-lunar-consumer-tag
value: premium
strategy:
allocation_percentage: 80 # Percentage allocation of the total requests
- id: BasicQuota
parent_id: MyQuota
filter:
headers:
- key: x-lunar-consumer-tag
value: basic
strategy:
fixed_window:
static:
max: 20000
interval: 1
interval_unit: day
spillover:
max: 100 # Enables carryover of unused quota to the next window

Key Components of the Quota Configuration​

FieldDescriptionMandatory/OptionalExample
quota.idA unique identifier for the quota.MandatoryMyQuota
filter.urlThe URL pattern that the quota applies to.Mandatoryapi.website.com/*
filter.headers.keyOptional header used for filtering requests.Optionalx-lunar-consumer-tag
filter.headers.valueValue of the header to match for filtering.Optional'premium'
strategy.fixed_window.static.maxMaximum number of requests allowed within the window.Mandatory (if used)1000
strategy.fixed_window.static.intervalTime window duration for the quota.Mandatory (if used)24
strategy.fixed_window.static.interval_unitUnit of time for the window (second, minute, hour, day, month).Mandatory (if used)hour
strategy.fixed_window.group_by_headerGroup quota by the value of a specific header.Optionalx-lunar-consumer-tag
strategy.fixed_window.dynamic.remaining_headerHeader exposing remaining quota for dynamic configurations.OptionalX-RateLimit-Limit
strategy.fixed_window.dynamic.reset_time_headerHeader indicating reset time for dynamic quotas.OptionalX-RateLimit-Reset
strategy.fixed_window.dynamic.retry_after_headerHeader for retry-after details in dynamic quotas.OptionalRetry-After
strategy.concurrent.max_request_countMax concurrent requests allowed.Mandatory (if used)50
strategy.concurrent.remaining_headerHeader to expose remaining concurrent requests.OptionalX-Concurrent-Remaining
strategy.fixed_window_llm.static.max_requestsMax requests allowed in the window.Mandatory for LLM100
strategy.fixed_window_llm.static.max_tokensMaximum tokens allowed in the window.Optional10000
strategy.fixed_window_llm.static.max_input_tokensMaximum input tokens allowed.Optional5000
strategy.fixed_window_llm.static.max_output_tokensMaximum output tokens allowed.Optional5000
strategy.fixed_window_llm.static.intervalTime window duration for LLM quota.Mandatory1
strategy.fixed_window_llm.static.interval_unitTime unit for LLM window.Mandatoryminute
internal_limits.idUnique identifier for child quotas.OptionalMyChildQuota
internal_limits.parent_idLinks the child quota to its parent quota.OptionalMyQuota
internal_limits.filter.urlURL pattern that the child quota applies to.Mandatory (if used)