API Quotas Configuration Template
Quota Configuration Templateβ
This template demonstrates how to configure quotas using Lunarβs system. It includes examples for fixed window, concurrent-based strategies, and the newly added fixed_window_llm
strategy. Users can define internal limits for more specific cases as needed.
note
File names for Lunar's YAML configurations don't need to follow a specific convention (e.g., quota.yaml
). As long as the file is placed in the correct folder, Lunar will automatically detect and apply it.
/etc/lunar-proxy/quotas/{fileName}.yaml
quotas:
- id: MyQuota # Unique identifier for the quota
filter: # Define filter conditions for this quota
url: api.website.com/* # URL pattern to apply the quota
headers: # Optional: Header-based filtering
- key: x-lunar-consumer-tag # Header to filter on
value: premium # Example value 1
- key: x-lunar-consumer-tag # Header to filter on
value: basic # Example value 2
strategy: # Quota strategy definition
fixed_window: # Fixed Window Strategy
static:
max: 1000 # Maximum requests allowed within the window
interval: 24 # Duration of the time window
interval_unit: hour # Unit of time for the interval (second/minute/hour/day/month)
group_by_header: x-lunar-consumer-tag # Optional: Group by header value
dynamic: # Dynamic Header-Based Configuration
remaining_header: X-RateLimit-Limit
reset_time_header: X-RateLimit-Reset
retry_after_header: Retry-After
- id: LLMQuotaOpenAI # Unique identifier for the quota
filter:
url: api.openai.com/* # URL pattern to apply the quota
strategy:
fixed_window_llm:
static:
max_requests: 100 # Maximum requests allowed within the window
max_tokens: 10000 # Maximum tokens allowed within the window
interval: 1
interval_unit: minute # Time window in minute
concurrent: # Concurrent Strategy
max_request_count: 50 # Maximum number of concurrent requests allowed
remaining_header: X-Concurrent-Remaining # Optional: Header to expose remaining concurrent requests
internal_limits: # Optional: Define nested child quotas within the main quota
- id: MyChildQuota # Unique identifier for the child quota
parent_id: MyQuota # Links the child quota to its parent
filter:
url: api.website.com/specific # URL pattern specific to the child quota
strategy:
fixed_window: # Strategy for managing child quota with a fixed window
static:
max: 500 # Maximum requests allowed within the child quota
interval: 1 # Window duration for the child quota
interval_unit: day # Unit of time for the interval (second/minute/hour/day/month)
- id: PremiumQuota
parent_id: MyQuota
filter:
headers:
- key: x-lunar-consumer-tag
value: premium
strategy:
allocation_percentage: 80 # Percentage allocation of the total requests
- id: BasicQuota
parent_id: MyQuota
filter:
headers:
- key: x-lunar-consumer-tag
value: basic
strategy:
fixed_window:
static:
max: 20000
interval: 1
interval_unit: day
spillover:
max: 100 # Enables carryover of unused quota to the next window
Key Components of the Quota Configurationβ
Field | Description | Mandatory/Optional | Example |
---|---|---|---|
quota.id | A unique identifier for the quota. | Mandatory | MyQuota |
filter.url | The URL pattern that the quota applies to. | Mandatory | api.website.com/* |
filter.headers.key | Optional header used for filtering requests. | Optional | x-lunar-consumer-tag |
filter.headers.value | Value of the header to match for filtering. | Optional | 'premium' |
strategy.fixed_window.static.max | Maximum number of requests allowed within the window. | Mandatory (if used) | 1000 |
strategy.fixed_window.static.interval | Time window duration for the quota. | Mandatory (if used) | 24 |
strategy.fixed_window.static.interval_unit | Unit of time for the window (second, minute, hour, day, month). | Mandatory (if used) | hour |
strategy.fixed_window.group_by_header | Group quota by the value of a specific header. | Optional | x-lunar-consumer-tag |
strategy.fixed_window.dynamic.remaining_header | Header exposing remaining quota for dynamic configurations. | Optional | X-RateLimit-Limit |
strategy.fixed_window.dynamic.reset_time_header | Header indicating reset time for dynamic quotas. | Optional | X-RateLimit-Reset |
strategy.fixed_window.dynamic.retry_after_header | Header for retry-after details in dynamic quotas. | Optional | Retry-After |
strategy.concurrent.max_request_count | Max concurrent requests allowed. | Mandatory (if used) | 50 |
strategy.concurrent.remaining_header | Header to expose remaining concurrent requests. | Optional | X-Concurrent-Remaining |
strategy.fixed_window_llm.static.max_requests | Max requests allowed in the window. | Mandatory for LLM | 100 |
strategy.fixed_window_llm.static.max_tokens | Maximum tokens allowed in the window. | Optional | 10000 |
strategy.fixed_window_llm.static.max_input_tokens | Maximum input tokens allowed. | Optional | 5000 |
strategy.fixed_window_llm.static.max_output_tokens | Maximum output tokens allowed. | Optional | 5000 |
strategy.fixed_window_llm.static.interval | Time window duration for LLM quota. | Mandatory | 1 |
strategy.fixed_window_llm.static.interval_unit | Time unit for LLM window. | Mandatory | minute |
internal_limits.id | Unique identifier for child quotas. | Optional | MyChildQuota |
internal_limits.parent_id | Links the child quota to its parent quota. | Optional | MyQuota |
internal_limits.filter.url | URL pattern that the child quota applies to. | Mandatory (if used) |