Everyone is talking about Ruby performance lately, and with good reason. It turns out that with some smallish tweaks to your code it's possible to increase performance by up to 99.9%.

There are plenty of articles out there on how to optimize your code, but how can you make sure your code remains optimized?

You may not always consider the consequences when embedding a string literal rather than a frozen constant in a regularly called method -- it's much too easy to lose the savings of your optimizations when maintaining your code in the future.

These were my thoughts recently as I optimized some code for the second (or third) time in our Ruby gem at Honeybadger: "wouldn't it be great if there were a way to ensure that these optimizations don't regress?"

Regressions are something most of us are familiar with in software development, even if not by name. A regression happens when a bug or an issue which was resolved in the past reoccurs due to a future change to the same code. Nobody likes to do the same work more than once; regressions are like tracking dirt on the floor right after it's been swept.

Luckily, we have a secret weapon: tests. Whether you practice dogmatic TDD or not, tests are awesome for fixing bugs because they demonstrate the issue and the solution programmatically. Tests give us confidence that regressions won't happen when changes do.

Sound familiar? I thought so too, which made me wonder, "if performance optimizations can regress, why can't I catch those regressions with tests, too?"

There are a lot of great tools for profiling various performance aspects of Ruby including object allocations, memory, CPU, garbage collection, etc. Some of these include ruby-prof, stackprof and allocation_tracer.

I've recently been using allocation_stats to profile object allocations. Reducing allocations is a fairly easy task to accomplish, yielding a lot of low-hanging fruit for tuning memory consumption and speed.

For example, here's a basic Ruby class which stores an Array of 5 strings which default to 'foo':

class MyClass
  def initialize
    @values = Array.new(5)
    5.times { @values << 'foo' }
  end
end

The AllocationStats API is simple. Give it a block to profile, and it will print out where the most objects are allocated.

$ ruby -r allocation_stats -r ./lib/my_class
stats = AllocationStats.trace { MyClass.new } 
puts stats.allocations(alias_paths: true).group_by(:sourcefile, :sourceline, :class).to_text
^D
     sourcefile        sourceline   class   count
---------------------  ----------  -------  -----
/lib/my_class.rb           4       String       5
/lib/my_class.rb           3       Array        1
-                          1       MyClass      1

The #to_text method (called on a group of allocations) simply prints out a nice human-readable table grouped by whatever criteria you ask for.

This output is great when profiling manually, but my goal was to create a test which could run alongside my normal unit test suite (which is written in RSpec).We can see that on line 4 of my_class.rb, 5 strings are being allocated, which seems unnecessary since I know they all contain the same value. I wanted my scenario to read something like: "when initializing MyClass it allocates under 6 objects". In RSpec this looks something like:

describe MyClass do
  context "when initializing" do
    specify { expect { MyClass.new }.to allocate_under(6).objects }
  end
end

Using this syntax I have everything I need to test that object allocations are less than a given number for the described block of code (inside the expect block) using a custom RSpec matcher.

In addition to printing the trace results, AllocationStats provides a few methods for accessing the allocations via Ruby, including #allocations and #new_allocations. These are what I used to build my matcher:

begin
  require 'allocation_stats'
rescue LoadError
  puts 'Skipping AllocationStats.'
end

RSpec::Matchers.define :allocate_under do |expected|
  match do |actual|
    return skip('AllocationStats is not available: skipping.') unless defined?(AllocationStats)
    @trace = actual.is_a?(Proc) ? AllocationStats.trace(&actual) : actual
    @trace.new_allocations.size < expected
  end

  def objects
    self
  end

  def supports_block_expectations?
    true
  end

  def output_trace_info(trace)
    trace.allocations(alias_paths: true).group_by(:sourcefile, :sourceline, :class).to_text
  end

  failure_message do |actual|
    "expected under #{ expected } objects to be allocated; got #{ @trace.new_allocations.size }:\n\n" << output_trace_info(@trace)
  end

  description do
    "allocates under #{ expected } objects"
  end
end

I'm rescuing LoadError in the initial require statement because I may not want to include AllocationStats on every test run (it tends to slow down the tests). I then define the :allocate_under matcher which performs the trace inside of the match block. The failure_message block is also important because it includes the to_text output from the AllocationStats trace right inside my failure message! The rest of the matcher is mostly standard RSpec configuration.

With my matcher loaded, I can now run my scenario from before, and watch it fail:

$ rspec spec/my_class_spec.rb 

MyClass
  when initializing
    should allocates under 6 objects (FAILED - 1)

Failures:

  1) MyClass when initializing should allocates under 6 objects
     Failure/Error: expect { MyClass.new }.to allocate_under(6).objects
       expected under 6 objects to be allocated; got 7:

               sourcefile           sourceline   class   count
       ---------------------------  ----------  -------  -----
       <PWD>/spec/my_class_spec.rb           6  MyClass      1
       <PWD>/lib/my_class.rb                 3  Array        1
       <PWD>/lib/my_class.rb                 4  String       5
     # ./spec/my_class_spec.rb:6:in `block (3 levels) in <top (required)>'

Finished in 0.15352 seconds (files took 0.22293 seconds to load)
1 example, 1 failure

Failed examples:

rspec ./spec/my_class_spec.rb:5 # MyClass when initializing should allocates under 6 objects

OK, so I've programmatically demonstrated the performance problem, which is that MyClass allocates extra string objects with the same value. Let's fix that issue by throwing those values into a frozen constant:

class MyClass
  DEFAULT = 'foo'.freeze

  def initialize
    @values = Array.new(5)
    5.times { @values << DEFAULT }
  end
end

Now that I've fixed the issue, I'll run my test again and watch it pass:

$ rspec spec/my_class_spec.rb

MyClass
  when initializing
    should allocates under 6 objects

Finished in 0.14952 seconds (files took 0.22056 seconds to load)
1 example, 0 failures

Next time I change the MyClass#initialize method, I can be confident that I'm not allocating too many objects.

Because profiling allocations can be relatively slow, it would be ideal to run these on-demand rather than all the time. Because I'm already gracefully handling allocation_stats being missing, I can use Bundler to create multiple gemfiles and then specify which gemfile I want to use with the BUNDLE_GEMFILE environment variable:

$ BUNDLE_GEMFILE=with_performance.gemfile bundle exec rspec spec/
$ BUNDLE_GEMFILE=without_performance.gemfile bundle exec rspec spec/

Another option is to use a library like the appraisal gem, which takes this same approach and solves some Bundler gotchas. Jason Clark gave an excellent presentation on how to do this at Ruby on Ales in March 2015; check out his slides to learn more.

I also think that maintaining these types of tests separately from my normal unit tests is a good idea, so I'll create a new "performance" directory so that my unit test suite resides in spec/unit/ and my performance suite resides in spec/performance/:

spec/
|-- spec_helper.rb
|-- unit/
|-- features/
|-- performance/

I am still refining my approach to profiling Ruby code for performance; my hope is that maintaining a performance test suite will help me improve the speed of my code now, keep it fast in the future, and create documentation for myself and others.