When Automox was still just an idea, we hardly dared to dream that we’d already be supporting the largest enterprise customers at scale. To be able to handle this sort of expansion required us to put serious engineering effort into optimization. The end result of this effort is that we’ve managed to speed up Automox by a factor of 10, while supporting over 100K endpoints per tenant.
DataLayer Optimizes Data Retrieval for Better Performance
How did we manage a ten-fold increase in performance? We built a layer of abstraction, using the Repository Pattern, between our database and front-end code. We call this layer of abstraction the “DataLayer.” DataLayer brings the following benefits:
- Performance issues are solved once in DataLayer, rather than individually by each developer in each application
- Specialization - we can have an expert in optimizing database query performance work on just the performance issues
- Developers don’t have to keep track of where everything is stored in the database and can spend more time on the rest of their code
- We can build DataLayer independently of the rest of our development, and slowly migrate our applications over to the newer abstraction layer when appropriate
How does this work under the hood? Rather than having our code make database calls directly, it goes through an API to talk to DataLayer. DataLayer then determines the most efficient way to obtain that data from the database and serve it up for the front-end code to consume.
How Does DataLayer Work in Automox?
Let’s dig into the technical details of how DataLayer optimizes Automox’s performance.
First step was selecting the right tool for the job. We were using pgpool for our database requests, whereas DataLayer uses protocol-buffers (a.k.a. protobufs) and the gRPC framework. This architectural change fundamentally speeds up requests in a couple of ways: reducing payload size (all those packets going over the wire add up) and better use of parallelization (which means we can send more concurrent requests to read-only database clones).
Once we had our abstraction layer in place and a clear architectural design to build upon, the next step was speeding up how we actually pull data from the database from within DataLayer. We performed extensive benchmark analysis to make data-driven decisions around the best architectural pattern for data retrieval. Based on our analysis, we found that our extensive use of stored procedures and views were extremely inefficient. The main cause of this inefficiency was around overgeneralization and multi-table joins, and ad-hoc aggregates within the stored procedures and views. Instead, we settled on a query builder pattern to optimize our database queries. This allows us to fetch only what we need and reduce unconditional joins, complex subqueries and expensive ad-hoc aggregation. It also avoids repeated joins on tables that we’ve already fetched as part of the query.
Finally, we sped up fetching the data using Go’s SQL connection pooling and powerful goroutines which allow for massive parallelization. This allows us to fetch data, particularly sub-objects, in parallel from multiple read-only database replicas.
Additional Improvements with DataLayer
One additional gain from this change is that DataLayer is fetching all pages of the data, rather than the old method which fetches a page at a time as you click from page to page. This means that when you sort on a column it will sort the entire data set rather than just the page you are viewing.
Now that phase one of our DataLayer project is complete, we’re systematically replacing all of our existing database calls in our code with newer DataLayer gRPC calls. First step was the Devices page, which went live as of the Nov 27th release.
In the future, we’re planning additional improvements, such as gRPC Streaming and Memoized prepared statements. Streaming allows us to start sending the return data from the database over the wire as it is retrieved, rather than waiting for the query to finish before returning the full payload. Memoized prepared statements are a way of caching the built queries for reuse, saving time on the query builder, which is the most time consuming part of DataLayer. You can think of it as the best of both worlds between stored procedures and a query builder.
Between these two additions, we expect to see an additional ten-fold improvement in performance. Stay tuned for more information on these improvements in the future.
About Automox Automated Patch Management
Facing growing threats and a rapidly expanding attack surface, understaffed and alert-fatigued organizations need more efficient ways to eliminate their exposure to vulnerabilities. Automox is a modern cyber hygiene platform that closes the aperture of attack by more than 80% with just half the effort of traditional solutions.
Cloud-native and globally available, Automox enforces OS & third-party patch management, security configurations, and custom scripting across Windows, macOS, and Linux from a single intuitive console. IT and SecOps can quickly gain control and share visibility of on-prem, remote and virtual endpoints without the need to deploy costly infrastructure.
Experience modern, cloud-native patch management today with a 15-day free trial of Automox and start recapturing more than half the time you're currently spending on managing your attack surface. Automox dramatically reduces corporate risk while raising operational efficiency to deliver best-in-class security outcomes, faster and with fewer resources.