Quota Strategies
This document outlines the available quota strategies with detailed explanations, examples, and guidance on usage.
Fixed Window Strategy
The Fixed Window strategy is a rate limiting approach, where requests are restricted within a predefined time window. Once the limit is reached within the specified window, subsequent requests are blocked until the window resets.
Defines a maximum number of requests (max
) allowed within a specific interval.
The interval is configured by specifying the duration (e.g., 24 hours), after which the counter resets.
Can be grouped by a specific header, such as x-lunar-consumer-tag
, allowing limits to be applied per user or subscription level.
Supports both static and dynamic configurations.
quotas:
- id: FixedWindowQuota # Unique identifier for the quota
filter: # Define filter conditions for this quota
url: api.website.com/* # URL pattern to apply the quota
strategy:
fixed_window:
static:
max: 1000 # Maximum requests allowed within the window
interval: 24 # Window duration
interval_unit: hour # Unit of time for the interval (second/minute/hour/day/month)
group_by_header: x-lunar-consumer-tag # Optional: Group by header
monthly_renewal:
day: 1
hour: 0
minute: 0
timezone: UTC # Optional: Monthly renewal configuration
Concurrent Strategy
The Concurrent strategy sets a limit on the number of simultaneous requests rather than tracking requests within a time window. This approach helps manage concurrent traffic across various flows and endpoints, reducing server strain during high-traffic periods.
Defines a maximum allowed concurrent requests limit (max_request_count
).
If the limit is reached, additional requests are blocked until active requests finish, freeing up capacity for new requests.
Supports optional headers for reporting remaining limits.
quotas:
- id: ConcurrentQuota # Unique identifier for the quota
filter: # Define filter conditions for this quota
url: api.website.com/* # URL pattern to apply the quota
strategy:
concurrent:
max_request_count: 50 # Maximum concurrent requests allowed
remaining_header: X-Concurrent-Remaining # Optional: Header to expose remaining concurrent requests
Fixed Window LLM Strategy
The Fixed Window LLM strategy provides granular control over LLM API traffic, enabling limits based on tokens and requests within a fixed time window.
Supports configurations for:
- Maximum requests allowed (
max_requests
). - Token-based limits, including total tokens (
max_tokens
), input tokens (max_input_tokens
), and output tokens (max_output_tokens
). - Flexible time windows for rate limiting (e.g., minute, hour).
quotas:
- id: FixedWindowLLMQuota # Unique identifier for the quota
filter:
url: api.openai.com/* # URL pattern to apply the quota
strategy:
fixed_window_llm:
static:
max_requests: 100 # Maximum requests allowed within the window
max_tokens: 10000 # Maximum tokens allowed within the window
interval: 1
interval_unit: minute # Time window duration in minutes
Choosing the Right Strategyβ
When selecting a quota strategy, consider that as an API consumer, youβre typically bound to the quota configuration set by the API provider. For example, if the provider uses a Fixed Window rate-limiting strategy, then your application should be designed to align with these limits. However, for internal limits within your own systems, especially those nested under a parent quota, you have more flexibility to choose how best to structure them based on your specific traffic management needs.
Use Case | Recommended Strategy | Description |
---|---|---|
Steady Traffic Control | Fixed Window | Suitable for consistent rate-limiting needs, such as capping requests within a fixed time frame. |
Burst Handling | Fixed Window | Helps manage and contain traffic spikes within a defined window, preventing overload. |
High Concurrency | Concurrent | Limits the number of active requests, ideal for environments with high real-time demand. |
LLM API Usage | Fixed Window LLM | Controls requests and tokens to ensure efficient use of LLM APIs. |
User-Based Quotas | Fixed Window | Groups requests by user-level headers (e.g., x-lunar-consumer-tag ) for differentiated user or subscription-tier quotas. |
Service Load Balancing | Concurrent | Controls simultaneous connections to balance load across your resources dynamically. |
Advanced Example: Combining Strategies with Internal Limitsβ
In more complex scenarios, such as managing quotas across different user levels or services, users can configure internal limits that apply both Fixed Window and Concurrent strategies within the same configuration file.
quotas:
- id: CombinedQuota # Unique identifier for the main quota
filter:
url: api.website.com/* # URL pattern to apply the quota
# Main quota strategy using Fixed Window
strategy:
fixed_window:
static:
max: 5000 # Maximum requests allowed in the main quota window
interval: 1
interval_unit: day
group_by_header: x-lunar-consumer-tag # Optional grouping
# Nested concurrent limit for premium users within the main quota
internal_limits:
- id: PremiumConcurrentLimit
parent_id: CombinedQuota
filter:
headers:
- key: x-lunar-consumer-tag
value: premium
strategy:
concurrent:
max_request_count: 100 # Max concurrent requests for premium users