As developers, one of our perpetual goals is to ensure our systems, such as our exchange matching engine software, run as efficiently as possible, using the minimum required resources to achieve maximum performance. Recently, we embarked upon an effort to reduce EP3™’s memory footprint by auditing our live memory usage and attempting to trim out as much unnecessary overhead as we could find.
Our team was inspired by the needs of our high-volume customers – who pump a large magnitude of orders through our exchange matching software daily. The result? We were able to:
- inspect our memory heap
- cut out 10x of our steady-state memory usage
- minimize bursts of memory required during peak activity.
Read more to find out how we executed this incredible revamp of our matching engine technology.
Deploying Go’s pprof Tooling
We turned to Go’s internal pprof tooling – a powerful suite for profiling applications through visualization and analysis of data. It can help us analyze CPU usage, memory allocation, and more while our services run live in a test environment. For our purposes, we focused on heap profiling to understand and optimize memory usage while under high load in a real Kubernetes cluster.
First, we needed to enable pprof across all of our Go microservices. This is remarkably straightforward. By importing the net/http/pprof
package and setting up a temporary HTTP server, we could access various profiling endpoints while ensuring that the debugging behavior is off by default for production workloads.
With this setup, we could access the pprof endpoints by navigating to http://localhost:<our forwarded port>/debug/pprof
/
in our browser after forwarding each service’s port from the Kubernetes cluster to our local development machine. To dump the heap after forcing a garbage collection event, we hit the GET /debug/pprof/heap?gc=1
endpoint and saved locally. Then, we ran go tool pprof -http=:8082 heap.out
to visualize the heap like so:
Through this view, we were able to determine that there were 2 internal caches that were unnecessarily large for their particular use case within our codebase, so we reduced their size. This eliminated a large chunk of our memory footprint, but we were still seeing nominal utilization increases under bursts of exchange activity. We believed we could still do better!
An Assist From gogctuner
gogctuner is a library designed to fine-tune Go’s garbage collector (GC). By default, the GC is conservative about releasing memory back to the OS, which can lead to higher memory usage than necessary. This library provides the means to adjust the aggressiveness of the garbage collector, allowing us to strike a balance between application performance and memory usage.
With more frequent garbage collection and memory release, our system experienced more consistent performance and reduced latency variance across all of EP3. The reduced memory usage also meant less pressure on each node’s memory in Kubernetes, leading to overall better resource utilization.
Go’s pprof tooling proved invaluable in our quest to optimize memory usage and boost system performance. By systematically identifying and addressing memory hotspots, we improved our application’s efficiency and gained more profound insights into Go’s memory management.
Going Forward
We’ve identified and optimized memory hotspots, so our matching engine technology, EP3, continues to operate like a well-oiled machine. For any developer looking to optimize their Go applications, pprof is an essential tool. It provides a window into your application’s performance, helping you make informed decisions and implement effective optimizations. Now, we want all our customers to benefit from these enhancements. Tuning recommendations and more details are available in our documentation.