Managing database access with ActiveRecord

Venky Iyer, 16 Jan 2018


I have previously written about some techniques we are using at Fin to combat software entropy in our Ruby on Rails app. One of the most common problems that we deal with has to do with performance degradation due to database queries — unnecessary queries, inefficient queries, queries that should have been batched etc.

In future posts I will discuss approaches to actually optimize database queries, but a good first step is to understand where database accesses are happening, and how to control them. In this post I will describe how we can measure, annotate, and control database access using the ActiveRecord ORM.

Logging database access in Rails console

The simplest tool in our kit is enables logging database queries for human perusal, as they happen in the Rails console. We use the wonderful [pry](http://pryrepl.org/) for our Rails console (instructions to set this up are here). In our .pryrc, we have defined a custom command called !debug :

We can now type !debug in the Rails console to set the Rails logger log-level to DEBUG, and also print out every ActiveRecord query as it executes.

How does this work? It takes advantage of ActiveSupport’s notifications mechanism, which lets us subscribe to an event stream of every ActiveRecord query. We’ve abstracted some of the logic for this into the ActiveRecordQueryCounter class:

ActiveRecordQueryCounter is a Singleton counter that when initialized, subscribes to sql.active_record events and tracks their count. It also lets us print a curated (and optionally truncated) stack trace of the code the query originates from; this is crucial to finding the culprit when you notice something off! 🕵

Logging database access programmatically

It is easy to see how we can extend this pattern to define a log_query_count utility that can be used to wrap arbitrary code blocks and add query logging to them:

For example:

Controlling database access

Now that we can watch and count database queries, we would like to control where they happen. Database queries are typically where most of the time in spent in a request. The n+1 query problem is well documented.

A particularly insidious kind of performance regression happens when you make a change to a method that was previously optimized to preloads all the data it needs, and inadvertently introduce new queries. It is non-trivial to predict where database queries will occur, as a consequence of ActiveRecord’s features that preload or cache associations under the hood. The logging extensions that I described above can help you identify what qeries are happening when you decide to investigate performance, but it is valuable to be able to raise an alarm when regressions are introduced, with respect to where database queries are expected.

This can be achieved by an extension of our log_query_count block above; we can define a with_no_db block which will raise exceptions if any queries occur within it. Here’s how this works:

We begin by monkey-patching ActiveRecord.connection.execute to test a global db_access_mode flag. This logic is encapsulated in our ExecutionControl module below, and included into the ActiveRecord connection adapter:

We then define the with_no_db and with_db blocks that will manage the db_access_mode flag:

with_no_db will throw exceptions on querying, like so:

By default, we use a strict version of with_no_db which cannot be overridden by a nested with_db block. This constrains all code that is invoked within the block to act without accessing the database. In some situations, we can choose to allow nested code to access the database, as long as it is annotated as with_db.

We also have a report_only version, which logs warnings to Rollbar instead of throwing exceptions. This can be useful for some critical codepaths where we prefer taking the performance hit to failing entirely.

We have only recently begun to use these extensions; but it’s already clear that they aren’t trivial to put into production in a mature app. Assumptions about on-demand querying are probably deeply ingrained in any significant Rails app, and it is also likely that a normal test suite isn’t sufficient to exercise these assumptions completely. It’s easy to ship a with_no_db block and have it unexpectedly throw exceptions. The report_only flag is a good way to start logging all such potential exceptions, and then switch to strict mode after gaining sufficient confidence that all inbound codepaths have been adjusted to preload data.

Conclusion

Query logging and database access control can be valuable tools to debug the performance of a Rails app. What are your favorite techniques to manage these? Let us know in the comments.

Thanks for reading!