Tuesday, August 22, 2006

» Write it in C or face 10,000 slow deaths! +

Ruby is slow. Matz owns up to the fact. Benchmarks show that it's true. Ruby is not too slow to be useful or anything, just slower (at some things) that many other languages, including perl and python. It is getting faster (1.9 is faster than 1.8.5 in many areas), but even so, when you need speed you often need to drop to a lower-level language like C to get it. That's true for perl and python too, BTW.

But the question is, when? When will you actually gain speed by going to C? Well, always, duh! But more specifically, when will the increase in speed be of enough benefit to justify the evilness? When is the evil of C a necessary evil?

In a comp.lang.ruby post from July, Peter Hickman suggested porting an algorithm for finding Latin Squares from perl to C for the speed increase. He limited his example to finding 5th order (5x5) squares. Droping into C took him from 12 minutes for his optimized perl version, down to 5 seconds for the C version. Now that seems like a major performance boot to be sure. But something is hinky here (not to disparage Mr. Hickman, who has been coding longer than I've been breathing!).

Firstly, I noticed some time ago that when I wrote programs that print to standard output/error frequently, or with large amounts of data, they were quite slow. I always thought that it was just slowness due to ruby, but I found that the same was true for python. So I got to thinking about it. I tried writing to a file instead, and sure enough, the programs ran faster by orders of magnitude. Well duh! It makes sense that it is slower to go through all the various layers it takes to finally reach the screen than to go through the (relatively) transparent filesystem layer. So I figured out that I shouldn't print to the console unless it was infequently, with small amounts of data.

Secondly, I noticed that Mr. Hickman's C code was using printf to show each row of the squares. That is strange. The Latin Square algorithm should be producing 161,280 unique squares. Hmmmm. Let's do the math here. Each square is 5 * 5, so 25 numbers. We have 161,280 unique squares. That means we have 4,032,000 numbers (all being 8-bit numbers in the range 1-5). Now there are 1,048,576 bytes in a meg. So we should have about 3.84 megs of total data.

ruby -e "puts 161_280 * 25 / 1_048_576.0"
> 3.84521484375

OK, now let's just asume that C is so fast that it only takes 1 second to do all the maths, and 4 seconds to print the results. That almost a meg a second throughput rate! That's fa-a-a-st! But that can't be right! How could ruby and python be so slow at console output and C be so fast? Well, it turns out that it isn't. I wrote a simple test case in C that prints a 10-byte string to the console 15,000 times (150k of data) using a simple for loop with puts, and low and behold, it took about 25 seconds. The same program in ruby (I did it in python also) took 26 seconds. So the ruby-to-console throughput is not much slower than C after all.

But this raises the question: How fast would the ruby version of the algorithm be if it printed to a file rather than to standard output? Well, using the faster of the two algorithms mentioned on the list, which uses the pure-ruby Permutation class and the built-in Set class, and which takes about 26 minutes to run on my machine, takes a whoping 17 seconds when using file IO rather than standard output! (Actually, you can whittle it down to 15 seconds if you collect all the results into an array and only use one IO write with the join'd array -- a memory for speed tradeoff).

Now I don't know about you, but I can live with 17 seconds, rather than trying to muck around with writing a C extension or (dread!) writing the whole program in C. Shoot, 17 seconds is longer than it takes me (I'm no C guru) to figure out what headers I would need to require!

So don't just assume that your bottle-neck is the ruby interpreter! It may be the case that C would give you no great advantage at all, and thus would not be a necessary evil, just an evil! ;)

For source codes and such see my mailing list post.

Postscript: This also makes me wonder how many benchmarks in the above-mentioned shootout may be printing to standard output? Of course, if all versions of the program in the benchmark did so, then I suppose it would still be fair, but it would still add unecessary lag to all of them (which I think are meant to test only specific aspects of the languages: number-crunching, object creation/manipulation, and so forth).

Addendum: Jürgen Strobel made a good point in this regard. My numbers are biased by the terminal emulator. This is exactly what I was trying to demonstrate--the same would apply to the C version. So the problem was not in ruby being so slow (i.e., "ruby takes 26 minutes because it is slow"). On the same note, using a regular tty rather than an xterm, and writing to stdout, the ruby program takes 23 seconds. That's only 6 seconds longer than writing to a file. That's not too bad. So I ammend my suggestion not to write large amounts of data to stdout to apply to a terminal emulator rather than a real tty.

0 Comments:

Post a Comment

<< Home