Note: This was originally posted to the mailling list, but has been made available here
Someone whose name I forget recently got his 15 minutes of fame by
posting a "benchmark" comparing the performance of a SCSI and IDE
drive of similar performance ratings. There are plenty of IDE / SCSI
comparisons out there which are much more scientific, but the results
of this one interested me. The author claimed that opening a maildir
mailbox with 50,000 messages in mutt took 7 minutes on his IDE drive,
but under 2 minutes on his SCSI drive.
This test is very unscientific, but I decided, what the hey, I have
both IDE and SCSI drives in my system, why not try this out myself?
In my case, the IDE drive is significantly faster, owing to being
newer UDMA interface vs an older UltraSCSI bus. My IDE drive can
sustain a max transfer of over 40MB / sec whereas my SCSI drives can
run at 22MB/s (as measured by the equally unscientific hdparm -tT).
This test was done with Kernel 2.4.22
My IDE drive is formatted with ReiserFS and mounted with noatime (I
originally tried this test with default settings. Thanks for Andrew's
link [ed: http://robert.timetraveller.org/talks/optimisation/]
to remind me of the noatime option. This makes a significant
difference when your scanning over 30,000 files)
I created a maildir mailbox in which I copied almost 60,000 e-mail
messages from the Samba mailing list, (so they were all small files).
To make a much too long story shorter, I saw results very similar to
those posted in the aforementioned test on Slashdot. Opening the
files took forever. This behaviour was not limited to Mutt. Even
copying the files from the IDE drive to the SCSI drive took an
inordinate amount of time, with a max sustained throughput of
700KB/s. Reading the same files from the SCSI drive was much faster,
with a sustained throughput of over 7MB/s.
This didn't make any sense to me whatsoever, so I decided to bang my
head against it for a few days.
When I got tired of copying and recreating this huge mail folder, I
eventually created a tarball of the thing. Now I discovered something
interesting. Untarring the files on the IDE drive only took a handful
of seconds. Strange, I thought, considering that it took me over 10
minutes to create it in the first place. This is when I finally
noticed that reading the files from my new untarred copy was as fast
as I would expect. Copying the files or even opening them in mutt now
took less than a minute, with a throughput of over 10MB/s...
*bing*
So, here, at long last, the conclusion......
For some reason, completely unknown to me, (I should probably repeat
this message on some kernel and or ReiserFS lists.) creating a
directory with thousands of files from within Mutt fragments the
filesystem so badly, ReiserFS can barely read the files afterwards.
Since this drive is mostly unused, I can think of no reason why this
would happen at all. Writing the files in any other way, including by
tar or cp, results in a directory that does not suffer this handicap.
My next test will be to e-mail myself 30 000 messages and see what
happens when exim writes the files.
My performance tuning tip, at the end of all this: if you use maildir
format mailboxes with a large number of messages, you can improve your
performance by occasionally backing up, deleting, and restoring your
mail directory.
For the sake of completeness, I tried to untar the file onto one of
the SCSI drives formatted and mounted as ext3. I don't know how many
people have tried doing something like this. The results are
entertaining, if not very practical. The throughput started out fine,
then reduced to under 500KB/s. In this case, the limiting factor was
CPU utilization, which was at 100% kernel utilization, writing to a
SCSI disk, with what was supposed to be the filesystem that's less CPU
intensive. I finally killed the process when only 30,000 files had
been restored. Strangely enough, reading the files (like say, opening
the mail folder in Mutt) performed better on ext3 than it did with
Reiser, by about 5 - 10%. Any operation that modified the directory
structure, however, was simply out of the question.
|