Acunote is online project management and Scrum software. Acunote is fast and easy to use. It shows actual progress, not just wishful thinking. Click here to learn more.
« Back to posts

Make Rails Associations Faster by Optimizing Named Blocks and String Callbacks

In our previous articles we described how Rails spends much of its time garbage collecting, and that significant speedup can be achieved by memory profiling and fixing memory allocation hotspots. In this article, we'll describe couple more such hotposts dealing with named block parameters and associations, and provide the patches.

Named Block Parameters Considered Harmful (for Performance)

We already wrote that passing a block to a method of ActiveRecord::Associations::HasManyAssociation instance and its friends chews up the memory. For example, a single call to association.select { |record| record.new_record? } can allocate up to 10K of memory depending on the association size. A brief look at associations source reveals that Rails itself has similar code in many places.

Each association is a proxy to the actual array of associated object(s). It seems like method_missing is a good way to implement proxy pattern in Ruby and indeed that's what Rails does. The proxy contains an array of associated objects and sends all missing methods in the proxy into that array. If we simplify the Rails code, we'll see something like this:

class Association
    def method_missing(method, args, &block)
        @array.send(method, args, &block)
    end
end

At first, we couldn't understand why this would be slow, but after some digging we got it. Each named &block parameter requires extra processing. Ruby creates a Proc object that represents the block passed and adds a Binding object with the local execution context to that Proc. In an empty Ruby script without any variables defined binding will be around 400 bytes. In actual Rails application bindings may grow up to 10K in size. Now imagine you're doing something with AR object and its association in a loop 100 times. Bah! 1 megabyte of memory is gone.

Each Ruby block is a closure, and it captures its complete environment at the time of creation. Ola Bini has a great article on this. So is all hope lost? No -- turns out that MRI has different implementations for named and anonymous block parameters. When calling a function which takes anonymous block, it simply stores a reference to the caller's stack frame. It's OK to do that since the callee is guaranteed to exit before caller's stack frame is popped. When calling a function that takes a named block MRI assumes that this block may be long-lived and clones the environment right there. So anonymous block parameters are much more efficient than named block parameters. Also see related discussion on Ruby Forum.

The optimization to Rails Association is simple - just pass a new block and yield the old one inside:

class Association
    def method_missing(method, args)
        @array.send(method, args) { |block_args| yield(block_args) if block_given? }
    end
end

This not only saves memory, but runs faster. I've benchmarked that on Acunote copying 120 objects (each with 6 associations) using ActiveRecord.

With named block parameters:

Benchmark Copy 120
memory: 97527K total in 1698240 allocations, GC calls: 13, GC time: 977 msec
time: 3.25 ± 0.05

With yields:

Benchmark Copy 120
memory: 92670K total in 1636677 allocations, GC calls: 12, GC time: 901 msec
time: 3.15 ± 0.05

As the result, 5 megabytes of memory and 100msec saved for good.

That's Cool! Where's The Patch?

String Callbacks Considered Harmful (for Performance)

This one is even more interesting. Rails allows for string callbacks in before_save, after_save, before_destroy and so on in ActiveRecord models. Each such callback is a string that is evaluated in the context of AR object. Let me cite Rails callbacks.rb source here:

...
def callback(method)
    notify(method)

<span class="n">callbacks_for</span><span class="p">(</span><span class="nb">method</span><span class="p">).</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">callback</span><span class="o">|</span>
    <span class="n">result</span> <span class="o">=</span> <span class="k">case</span> <span class="n">callback</span>
        <span class="k">when</span> <span class="no">Symbol</span>
            <span class="nb">self</span><span class="p">.</span><span class="nf">send</span><span class="p">(</span><span class="n">callback</span><span class="p">)</span>
        <span class="k">when</span> <span class="no">String</span>
            <span class="nb">eval</span><span class="p">(</span><span class="n">callback</span><span class="p">,</span> <span class="nb">binding</span><span class="p">)</span>
        <span class="k">when</span> <span class="no">Proc</span><span class="p">,</span> <span class="no">Method</span>
            <span class="n">callback</span><span class="p">.</span><span class="nf">call</span><span class="p">(</span><span class="nb">self</span><span class="p">)</span>
        <span class="k">else</span>
        <span class="p">.</span><span class="nf">.</span><span class="o">.</span></code></pre></figure>

You see, to evaluate the string we need to get the binding. And as we all remember from our named block parameter discussion, the binding takes memory. Even when you don't use string callbacks yourself, Rails associations automatically create them for you.

For example, has_many will define 4 string callbacks. You'll get before_save, after_create and after_update to assure that new associated records are saved when its parent record is saved; and also you'll get one for before_destroy that destroys dependent objects or nullifies their foreign keys.

Rewriting string callbacks into symbol callbacks gives a tangible performance boost. I did that change and benchmarked Acunote again.

With string callbacks in associations:

Benchmark Copy 120
memory: 92670K total in 1636677 allocations, GC calls: 12, GC time: 901 msec
time: 3.15 ± 0.05

With symbol callbacks in associations:

Benchmark Copy 120
memory: 39108K total in 944764 allocations, GC calls: 6, GC time: 479 msec
time: 2.45 ± 0.05

Whoa! Rewriting string callbacks to symbol callbacks saved 52 megabytes and gave 0.7 sec speedup. Nice!

That's Cool! Where The Patch?

Read comments on this post on Hacker News