Skip to main content
Version: Next

LLM Routing Flow

The LLM Routing Flow enables dynamic selection of LLMs based on user-defined conditions, evaluated before the request is sent. These conditions are evaluated using the Filter Processor to determine which model should handle a given request. This flow is designed to optimize performance, cost, or quality by routing requests to the most appropriate model. For example, shorter prompts can be directed to GPT-3.5 for faster, lower-cost responses, while longer or more complex prompts can be sent to GPT-4, Claude, or other models.

Flow Diagram


Scenarios

  1. Requests Over Token Limit: Requests can be sent to different LLM models depending on their token count.
  2. Requests With Specific Tasks: Requests with a specific task, such as text-to-image, will be sent to a specific LLM model.

Flow Components


Flow Example

Here is an example of a fully configured LLM Routing Flow. In this case, if a request has more than 1000 tokens and is sent to OpenAI GPT-4.5, it will be transformed and sent to OpenAI GPT-4.1.

/etc/lunar-proxy/flows/flow.yaml
name: LLMRoutingFlow
filter:
url: api.openai.com/*
processors:
FilterModel_gpt-4_5_0:
processor: Filter
parameters:
- key: expressions
value:
- $.request[?(@.body.model == "gpt-4.5")]
CountTokens_gpt-4_5_0:
processor: CountLLMTokens
parameters:
- key: store_count_header
value: x-lunar-estimated-tokens
- key: model
value: gpt-4.5
FilterTokensCount_gpt-4_5_0:
processor: Filter
parameters:
- key: header
value: x-lunar-estimated-tokens >= 1000
TransformModel_gpt-4_5_0:
processor: TransformAPICall
parameters:
- key: set
value:
$.request.body.model: gpt-4.1
flow:
request:
- from:
stream:
name: globalStream
at: start
to:
processor:
name: FilterModel_gpt-4_5_0
- from:
processor:
name: FilterModel_gpt-4_5_0
condition: hit
to:
processor:
name: CountTokens_gpt-4_5_0
- from:
processor:
name: FilterModel_gpt-4_5_0
condition: miss
to:
stream:
name: globalStream
at: end
- from:
processor:
name: CountTokens_gpt-4_5_0
to:
processor:
name: FilterTokensCount_gpt-4_5_0
- from:
processor:
name: FilterTokensCount_gpt-4_5_0
condition: miss
to:
stream:
name: globalStream
at: end
- from:
processor:
name: FilterTokensCount_gpt-4_5_0
condition: hit
to:
processor:
name: TransformModel_gpt-4_5_0
- from:
processor:
name: TransformModel_gpt-4_5_0
to:
stream:
name: globalStream
at: end
response:
- from:
stream:
name: globalStream
at: start
to:
stream:
name: globalStream
at: end