bglug logo


<-- Articles
Email Us

  The infamous SCSI vs IDE mutt test

 by Remi Gauvin 2003-10-23
Note: This was originally posted to the mailling list, but has been made available here

     Someone whose name I forget recently got his 15 minutes of fame by posting a "benchmark" comparing the performance of a SCSI and IDE drive of similar performance ratings. There are plenty of IDE / SCSI comparisons out there which are much more scientific, but the results of this one interested me. The author claimed that opening a maildir mailbox with 50,000 messages in mutt took 7 minutes on his IDE drive, but under 2 minutes on his SCSI drive.

     This test is very unscientific, but I decided, what the hey, I have both IDE and SCSI drives in my system, why not try this out myself? In my case, the IDE drive is significantly faster, owing to being newer UDMA interface vs an older UltraSCSI bus. My IDE drive can sustain a max transfer of over 40MB / sec whereas my SCSI drives can run at 22MB/s (as measured by the equally unscientific hdparm -tT). This test was done with Kernel 2.4.22

     My IDE drive is formatted with ReiserFS and mounted with noatime (I originally tried this test with default settings. Thanks for Andrew's link [ed: http://robert.timetraveller.org/talks/optimisation/] to remind me of the noatime option. This makes a significant difference when your scanning over 30,000 files)

     I created a maildir mailbox in which I copied almost 60,000 e-mail messages from the Samba mailing list, (so they were all small files).

     To make a much too long story shorter, I saw results very similar to those posted in the aforementioned test on Slashdot. Opening the files took forever. This behaviour was not limited to Mutt. Even copying the files from the IDE drive to the SCSI drive took an inordinate amount of time, with a max sustained throughput of 700KB/s. Reading the same files from the SCSI drive was much faster, with a sustained throughput of over 7MB/s.

     This didn't make any sense to me whatsoever, so I decided to bang my head against it for a few days.

     When I got tired of copying and recreating this huge mail folder, I eventually created a tarball of the thing. Now I discovered something interesting. Untarring the files on the IDE drive only took a handful of seconds. Strange, I thought, considering that it took me over 10 minutes to create it in the first place. This is when I finally noticed that reading the files from my new untarred copy was as fast as I would expect. Copying the files or even opening them in mutt now took less than a minute, with a throughput of over 10MB/s...

*bing*


So, here, at long last, the conclusion......

     For some reason, completely unknown to me, (I should probably repeat this message on some kernel and or ReiserFS lists.) creating a directory with thousands of files from within Mutt fragments the filesystem so badly, ReiserFS can barely read the files afterwards. Since this drive is mostly unused, I can think of no reason why this would happen at all. Writing the files in any other way, including by tar or cp, results in a directory that does not suffer this handicap. My next test will be to e-mail myself 30 000 messages and see what happens when exim writes the files.

     My performance tuning tip, at the end of all this: if you use maildir format mailboxes with a large number of messages, you can improve your performance by occasionally backing up, deleting, and restoring your mail directory.

     For the sake of completeness, I tried to untar the file onto one of the SCSI drives formatted and mounted as ext3. I don't know how many people have tried doing something like this. The results are entertaining, if not very practical. The throughput started out fine, then reduced to under 500KB/s. In this case, the limiting factor was CPU utilization, which was at 100% kernel utilization, writing to a SCSI disk, with what was supposed to be the filesystem that's less CPU intensive. I finally killed the process when only 30,000 files had been restored. Strangely enough, reading the files (like say, opening the mail folder in Mutt) performed better on ext3 than it did with Reiser, by about 5 - 10%. Any operation that modified the directory structure, however, was simply out of the question.


Valid XHTML 1.0!