SmartLogic Logo (443) 451-3001

The SmartLogic Blog

SmartLogic is a web and mobile product development studio based in Baltimore. Contact us for help building your product or visit our website to learn more about what we do.

Benchmark Ruby Code with R, rsruby and better-benchmark

October 8th, 2008 by

I’ve found myself on a benchmarking kick these last couple of weeks. Sometime last week, I dug up the better-benchmark library written by Pistos. Pistos’ library is basically just a wrapper for the rsruby gem, which is more or less an interface to R (similar to what rmagick is to ImageMagick).

Combining these tools together, we can do some pretty nifty code performance analysis in very few lines of code, e.g.

require 'rubygems'
require 'better-benchmark'

result = Benchmark.compare_realtime(:iterations => 10) { |iteration|
  save_the_world()
}.with { |iteration|
  save_the_world_and_save_the_girl()
}
Benchmark.report_on result

I have forked better-benchmark and wrapped the library up into a RubyGem.

Currently, the gem is available from github:

$> sudo gem install jtrupiano-better-benchmark

Based on preliminary discussions with Pistos, he intends to merge my changes back into his branch. I’ll update this post with the relevant details when that has been completed.

So, let’s get started by preparing your box for better-benchmark!

Installing R

On Mac OSX

This one’s simple. AT&T Research provides us with a DMG.

On Ubuntu 8.0.4

I first built this just fine on a Mac. But, I decided that I wanted a dedicated VM where I could run some fairly intense benchmarks. You’ll need to ensure you have the following apt packages installed:

* build-essential
* g77 (fortran compiler)
* x11-common

There may be a few others that are required, but I started on a pre-built box more or less ready to host rails apps, and so there may be a few other necessities I missed. The following steps will install R from source on Ubuntu (I was unable to get the necessary R headers installed using apt-get or apititude), and prepare you for installing the rsruby gem.

$> sudo apt-get install build-essential g77 x11-common
$> cd /opt
$> wget http://lib.stat.cmu.edu/R/CRAN/src/base/R-2/R-2.7.2.tar.gz
$> tar xzf R-2.7.2.tar.gz
$> cd R-2.7.2
# --enable-R-shlib is important...it signals the installer to build and make available libR.so
$> ./configure --enable-R-shlib
$> make
$> sudo make install

Installing rsruby

Assuming you installed R fine, rsruby should be easily installed using gem. If you have trouble, see the installation instructions on the project’s README.

Note that you’ll need to set the R_HOME environment variable prior to installing the gem. On Mac, R_HOME=/Library/Frameworks/R.framework/Resources. On Ubuntu (when installed from source), R_HOME=/usr/local/lib/R. I find it helpful to just drop this into /etc/environment (on Ubuntu) so that this variable is set upon login.

$> export R_HOME=/path/to/R/for/your/OS
$> sudo gem install rsruby -- --with-R-dir=$R_HOME

Installing better-benchmark

Currently, you can get this directly off of my fork on github.

$> gem sources -a http://gems.github.com/
$> sudo gem install jtrupiano-better-benchmark

Great, I have it, now what do I do with it??

So now we’re ready to benchmark something. The most important thing to understand when benchmarking is that you need to clearly identify what is part of the benchmark, and what is not. Let’s take a look at a real-world example. We’ll go through the following steps:

  1. Hypothesize: define exactly what you’re looking to test, and take special care to describe what you are NOT testing.
  2. Plan: write out step by step how you will accomplish your benchmark
  3. Refine: identify which steps belong in the benchmark, and which are really setup/teardown aspects of the benchmark, and thus shouldn’t be included in the test.
  4. Test: run the benchmarks
  5. Rinse/Repeat: (as necessary) tweak your test parameters, tweak the tests, re-test

Real world example

I was building a set of services that needed to download tens to hundreds of thousands of feeds regularly and perform some post-processing. I pulled in the feed-normalizer gem. My implementation plan called for me to write separate services, one for “pulling down” the feed, and one for “processing” the feed.

My first approach entailed storing the feeds on the hard-drive as YAML (using the to_yaml function). After running into some obscure problems with YAML and multi-line strings, I started to explore alternative persistence formats. One that caught my attention was using the Marshall standard library to store the content as bytecode. This brings me to my hypothesis:

1) Hypothesis: Converting FeedNormalizer::Feed objects to bytecode using the Marshal library is faster than converting those same obejcts to YAML using the YAML library.

I also like to list out my goals at this stage:

  1. We only want to test the #dump conversion (object –> string) and the #read conversion (string –> object). We do not want to test the write to disk and the read from disk portions.
  2. We specifically want to test FeedNormalizer::Feed objects, since that’s what we’re using in our code.
  3. Let’s benchmark YAML / Marshal #dump and #read methods specifically on FeedNormalizer::Feed objects

2) Plan:

  1. We have already grabbed roughly 100 feed downloads, and they are sitting as raw xml in test/raw_rss.
  2. We’ll read them in, then benchmark YAML.dump vs. Marshal.dump
  3. Then we’ll independently benchmark YAML.read vs. Marshal.read

3) Refine:
Here’s where we need to ensure that unnecessary processing doesn’t make its way into our benchmarks. For instance, in order to benchmark Marshal#dump against YAML#dump, we’ll need to load up all of the FeedNormalizer::Feed objects (by reading them from disk and instantiating them) prior to starting the benchmark.

4) Test:
It’s a ruby script, so go ahead and run it. The set of options provided by better-benchmark are easily discernible from the source file.

5) Rinse/Repeat:
Same story as always.

Real Script Example

#!/usr/bin/env ruby
# Author: John Trupiano
# 2008-09-28
#
# In order to run this benchmark, you need to have the better-benchmark library installed (http://github.com/Pistos/better-benchmark/tree/master)
# better-benchmark depends on the rsruby gem (http://web.kuicr.kyoto-u.ac.jp/~alexg/rsruby/)
# rsruby depends on an installation of the computing package R (http://mirrors.ibiblio.org/pub/mirrors/CRAN/)
#
# For details on setting up R, rsruby, and better-benchmark, see this blog post: http://blog.smartlogicsolutions.com/2008/10/08/benchmark-ruby-code-with-r-rsruby-and-better-benchmark

# This is intended to be run from RAILS_ROOT, e.g.
# /path/to/rails/root $> ruby test/benchmark/yaml_bytecode_test.rb

require 'openssl'
require 'yaml'
require 'rubygems'
require 'feed-normalizer'
require 'lib/rss_parser'
require 'better-benchmark'

# GOALS 
# 1) We only want to test the #dump conversion (object --> string) and the #read conversion (string --> object).  We do not want to test the write to disk and the read from disk portions.
# 2) We specifically want to test FeedNormalizer::Feed objects, since that's what we're using in our code.
# 3) Let's benchmark YAML / Marshal #dump and #read methods specifically on FeedNormalizer objects

# PLAN
# 1) We've grabbed roughly 100 feed downloads sitting in test/raw_rss.
# 2) We'll read them in, then benchmark YAML.dump vs. Marshal.dump
# 3) Then we'll independently benchmark YAML.read vs. Marshal.read

fn_feeds = {}
root_dir = File.join("test", "raw_rss")
feed_ids = (1..62).to_a + (64..108).to_a

# build a global hash fn_feeds that contains the FeedNormalizer::Feed entries
feed_ids.each do |feed_id|
  infile = File.join(root_dir, feed_id.to_s, feed_id.to_s + '_dump.rss')
  fn_feeds[feed_id] = RssParser.parse(infile, :file)
end

#### BENCHMARK 1 ####
result = Benchmark.compare_realtime(
  :iterations => 25,
  :verbose => true
) { |iteration|
  fn_feeds.each_pair do |feed_id, feed|
    YAML.dump(feed)
  end
}.with { |iteration|
  fn_feeds.each_pair do |feed_id, feed|
    Marshal.dump(feed)
  end
}
Benchmark.report_on result


# now, let's create separate collections storing the dumps
yaml_dumps = {}
marshal_dumps = {}
fn_feeds.each_pair do |feed_id, feed|
  yaml_dumps[feed_id] = YAML.dump(feed)
  marshal_dumps[feed_id] = Marshal.dump(feed)
end

#### BENCHMARK 2 ####
result = Benchmark.compare_realtime(
  :iterations => 25,
  :verbose => true
) { |iteration|
  yaml_dumps.each_pair do |feed_id, yaml|
    YAML.dump(yaml)
  end
}.with { |iteration|
  marshal_dumps.each_pair do |feed_id, marshal|
    Marshal.dump(marshal)
  end
}
Benchmark.report_on result

And for good measure, let’s take a look at the results.

john-trupianos-macbook-pro:trunk john$ ruby test/benchmark/yaml_bytecode_test.rb 
.........................
Set 1 mean: 2.095 s
Set 1 std dev: 0.071
Set 2 mean: 0.123 s
Set 2 std dev: 0.019
p.value: 1.58214572048972e-14
W: 625.0
The difference (-94.1%) IS statistically significant.
.........................
Set 1 mean: 0.191 s
Set 1 std dev: 0.015
Set 2 mean: 0.022 s
Set 2 std dev: 0.012
p.value: 1.58214572048972e-14
W: 625.0

As R plainly tells us, the difference is statistically significant (those who remember their p-values from stats class raise your hands) for both benchmarks (note that I did run two separate benchmarks in this example, one for dumping and one for reading). That said, considering I don’t have a need for this data to be human readable on my filesystem, I can safely conclude that using the Marshal library in lieu of the YAML library will give me a performance boost on both the read/dump methods. Now, whether or not this boost is negligible in the scope of the greater system, well that’s a question for a separate benchmark.

(What I mean by this last part is that this #dump/#read portion of my whole system may be tiny. If it only represents 0.5% of the processing time, then the improvement my example shows may be more or less negligible in the context of the whole system. These are the types of questions you need to ask yourself when benchmarking.)

Primary take-home point: know what you’re benchmarking, and benchmark what you don’t know.

  • http://blog.purepistos.net Pistos

    Nice comprehensive post. Thanks for using better-benchmark! There’s one thing I wanted to point out, though: [Statistics experts caution us](http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test#Test_procedure) that using a high number of pairs in the Wilcoxon test causes the calculated probability(ies) to tend toward a normal approximation. In our case, the number of pairs is the number of better-benchmark :iterations, and “high” is a subjective number, ranging from 15 to 25. In your code, you used 100 and, as you can see, your p values were 0.0 and W were 10000. If I’m not mistaken, these are not usual numbers to get out of the Wilcoxon test. In contrast, see how I run no more than 20 outer iterations in [my uses of better-benchmark](http://blog.purepistos.net/index.php?s=better-benchmark), choosing instead to increase the time spent per iteration by doing inner iterations. My p values are always non-zero (though sometimes extremely close to zero), and my W values are in the hundreds.

    Having said all that, I don’t expect that reducing the iteration count will change the conclusion (of which method is faster), or the statistical significance of the results, but it would raise my confidence in the conclusion. Perhaps a statistics expert could shed more light on this?

  • http://www.smartlogicsolutions.com/wiki/John_Trupiano John Trupiano

    Thanks for calling this out Pistos. It would be interesting to explore the mathematical basis for a lot of this a little further (it’s been several years since I took a stats course, so I have some brushing up to do). I’m particularly excited about what else rsruby can make available to us via R (pretty graphs, charts, etc.). better-benchmark is a great start, but I think there’s an opportunity to beef it up a bit more (just not sure how and with what yet….back to the drawing board).

    I’ve gone ahead and updated this post with results for performing only 25 iterations as per your suggestion. Note that the p-values this time around are still practically 0.

    Last but not least, thanks for creating better-benchmark in the first place!

  • http://betterlogic.com/roger roger

    Fascinating use of ruby.

  • http://betterlogic.com/roger roger

    should advertise package in ruby talk :)

John Trupiano co-founded SmartLogic with Yair Flicker in May 2005 and was co-president through 2011. Check out his GitHub Projects or follow @jtrupiano on Twitter.

John Trupiano's posts