In the ever-evolving landscape of the API economy, time emerges as one of our most precious resources. Swift response times have become essential for both API providers and consumers. As providers strive to meet response time percentiles, consumers are becoming more aware of these metrics, often establishing service level agreements (SLAs) based on them.
At Lunar, we recognize the significance of latency. Our solution addresses the complex challenges of API consumption by by acting as a bridge between API providers and consumers. Naturally, we aim to minimize any impact on our users' existing latency. As developers, we knew instinctively that this is precisely what we would desire as end users.
To achieve nearly invisible latency footprint, extensive research went into selecting the ideal stack and architecture. However, any assumption must undergo rigorous testing to validate its accuracy. This is where latency benchmarking becomes invaluable.
Lunar's Performance Footprint
As declared in our Architecture page, Lunar operates alongside our users' applications, handling all outgoing traffic directed towards third-party providers. While its default behavior involves seamless forwarding of requests and responses, akin to forward proxies, its true brilliance materializes when augmented with remedy and diagnosis plugins.
While remedy plugins can modify requests and responses—e.g., retrieving responses from cache without issuing actual provider requests—diagnosis plugins, by their nature, lack such transformative capabilities.
Through our benchmarking sessions, we wanted to unveil Lunar's latency footprint on response time percentiles in comparison to direct API calls made without Lunar's intervention. In our experiments, we have selected a provider with a constant response time of 150ms, which is remarkably fast for a web-based API. In addition, we wanted to examine whether a correlation exists between the provider's response time and Lunar's latency footprint, or if, ideally, it remains constant regardless of the API provider's response time.
To accomplish this, we explore the following scenarios:
- Traffic traversal through Lunar's offering without any remediation or diagnosis plugins.
- Traffic routed via Lunar Proxy while employing a simple remedy plugin that doesn't short circuit requests.
- Traffic channeled through Lunar Proxy, accompanied by a straightforward diagnosis plugin, our HAR File Creator.
We allocated two different AWS EC2 instances of type c5a.large for this purpose - one dedicated to the provider only, and another one dedicated to the client application and Lunar. This is key: there will always be real network time when calling API providers; hence, the separate EC2 instances are crucial here. On the contrary, Lunar’s product is designed to be located as close as possible to the client application which integrates with it, so it makes sense to place these two on the same EC2 instance.
We used Apache AB, to simulate client-side behavior and gather necessary metrics. To replicate how client applications interact with Lunar, we directed Apache AB to call Lunar, which, in turn, forwarded the requests to the Provider.
In our performance analysis, in addition the set a baseline of "Direct calls to the provider" (referred to as "direct"), we conducted three experiments to compare the following scenarios:
- Calls to the provider via Lunar Proxy without any remediation or diagnosis (referred to as "with-lunar-installed").
- Calls to the provider via Lunar Proxy with remediation (referred to as "with-lunar-remedy").
- Calls to the provider via Lunar Proxy with diagnosis (referred to as "with-lunar-diagnosis").
The visualization below presents percentile values on the X-axis and runtime in milliseconds on the Y-axis, highlighting the differences between each experiment and the baseline direct experiment, which are relatively small across percentiles.
To provide a clearer understanding, let's examine the same graph with the Y-axis representing the delta from the baseline for each experiment and percentile. In the direct experiment, the delta remains 0 for all percentiles, as it is compared against itself.
- At the 99th percentile, Lunar Proxy adds between 4ms (with-lunar-installed) and 9ms (with-lunar-diagnosis) to the overall response time of HTTP calls, depending on the usage.
Through our benchmarking sessions, we aimed to assess the capacity of Lunar in handling requests per second under various load conditions. In this performance evaluation, we established an EKS cluster and node group with the provided hardware configuration, ensuring a reliable and scalable infrastructure. Employing our benchmarking tool, we generated a load on the Lunar to measure its performance within the EKS cluster. By systematically varying load parameters, including the number of concurrent connections and request rate, we sought to understand how Lunar performed under different scenarios.
Our objective was to uncover the efficiency and capability of Lunar, identifying potential bottlenecks or areas for improvement in its capacity to handle substantial workloads.
The results demonstrate the requests per second achieved by Lunar in each scenario, providing insights into its performance characteristics. With multiple requests ranging from 32 to 256, and capacity limits from 1 to 8 cores, we observed a clear correlation between these factors and the resulting requests per second. These findings contribute to a deeper understanding of Lunar's performance and can assist in optimizing its configuration for enhanced capacity and throughput.
|Concurrency||Capacity Limit (Cores)||Number of Requests||Requests per Second|
In this table, the columns represents the following:
- Concurrency: The number of multiple requests made at a time (concurrency).
- Capacity Limit (Cores): The specified limit on the number of cores for Lunar's capacity.
- Number of Requests: The total number of requests performed in the benchmark.
- Requests per Second: The average number of requests processed per second Lunar's Proxy can handle.
Through extensive benchmarking, Lunar demonstrates impressive results. It adds only a small latency footprint to response times, ranging from 4ms to 37ms at the 95th and 99th percentiles, respectively. The capacity benchmark shows that Lunar efficiently handles substantial workloads, achieving up to 84,867 requests per second. These outstanding results validate Lunar's effectiveness in minimizing latency and ensuring optimal performance in API interactions.