SmartLogic Logo (443) 451-3001

The SmartLogic Blog

SmartLogic is a web and mobile product development studio based in Baltimore. Contact us for help building your product or visit our website to learn more about what we do.

Mount options to improve ext4 file system performance

June 4th, 2009 by

I recently boosted my rails test suite running time by around 30% by adding certain mount options for my ext4 partition (works for ext3 too). I thought I’d blog about it because the first time I tried my system wouldn’t boot! So here are the step by step instructions:

2) Run:
> tune2fs -o journal_data_writeback /dev/sdXY
Where /dev/sdXY is replaced by the partition that you want to boost

4) Edit fstab

> nano -w /mnt/sdXY/etc/fstab

Find the line that references sdXY. It will look something like:

# /dev/sda2
UUID=be2f0ac2-4683-4550-bcd1-704a1a840b3e / ext4 relatime,errors=remount-ro 0 1

The first entry is the UUID (although on your system this could just be /dev/sdXY). The second entry is the path (/ for me). Third is the fstype (ext3/4). Fourth are the options. Fifth is for dump and sixth is pass. See man fstab(5) for more info.

Change the options to:

noatime,data=writeback,barrier=0,nobh,errors=remount-ro

(you can leave all of yours in place, if they weren’t the same as mine.

The main ones are replacing atime/relatime with noatime. This causes the FS to not write read-times to a file when read. Think about it. Writing to the FS for every read of the FS? crazy!

Next is data=writeback. This means that metadata for files can be written lazily after the file is written. This will not cause file system corruption, but it may cause the most recent changes to be lost in the event of a crash (so you may jump back into the past a bit).

Next is barrier, which is slightly more dangerous:

barrier=<0|1(*)> This enables/disables the use of write barriers in
the jbd code. barrier=0 disables, barrier=1 enables.
This also requires an IO stack which can support
barriers, and if jbd gets an error on a barrier
write, it will disable again with a warning.
Write barriers enforce proper on-disk ordering
of journal commits, making volatile disk write caches
safe to use, at some performance penalty. If
your disks are battery-backed in one way or another,
disabling barriers may safely improve performance.

Next is nobh:

bh (*) ext4 associates buffer heads to data pages to
nobh (a) cache disk block mapping information
(b) link pages into transaction to provide
ordering guarantees.
“bh” option forces use of buffer heads.
“nobh” option tries to avoid associating buffer
heads (supported only for “writeback” mode).

You can skip barrier and nobh if you’d like. noatime and data=writeback are the big ones.

6) Reboot to your system.

If you have any trouble booting, just boot a recovery disk and revert the fstab changes.

EDIT: Updated to no longer require recovery disk booting thanks to Nicolas Alpi’s response post.

  • Pingback: Accelerate your test in Rails with Ubuntu | Not Geekly Correct

  • Ward

    Nice hack.

    if the filesystem is the bottleneck for your test suite, it might be a smart investment to purchase a solid-state drive.

  • Nick Gauthier

    Funny you should mention that, my SSD came in the mail yesterday! However I saw 0 improvement in test times.

    I think it’s because the data=writeback tweak causes the writes to be lazily synchronized behind the task that is running. Since the test suite accesses the disk frequently but not excessively, a normal hard disk is able to keep up.

    -Nick Gauthier

  • Pingback: Caffeine Driven Development » Blog Archive » L33t Links #64

  • http://www.notgeeklycorrect.com Nicolas

    What about the everyday life with an SSD do you think it’s worth the price? Did you see much improvement?

  • Nick Gauthier

    I got an OCZ Vertex 30gb SSD for $110 after rebate. I highly recommend OCZ as a brand for everything! They are great.

    For everyday tasks (and programming) it’s well worth the price. I buying one for all my computers.

    Your gems and software install much quicker, and your rails env boots a little faster, programs open faster, and you machine will shut down in the blink of an eye.

    -nick

  • Pingback: Oracle on ext4 warning « Oracle Stuff I Should Have Known !

  • Bron Gondwana

    Why not just run your test suite on tmpfs? (assuming it will fit in RAM – RAM is cheap)

    If you already don’t care about data integrity (and let’s face it, this is a test suite) then go the whole hog!

  • http://smartlogicsolutions.com/wiki/Nick_Gauthier Nick Gauthier

    @Bron

    Excellent idea, and I tried it back when I was writing this blog post (actually I mounted a ramdisk, but it’s the same thing).

    I found that while it did speed up some tasks, it did not affect the speed of the test suite.

    This is because all of the files and data for the test suite already get stored in memory by the kernel after they’ve been loaded, so it never really touches the disk except to write the data back every second or so.

    -Nick

  • Andrius

    I am not completely sure, but I think tweak written in this post is a bit careless. I recommend to review Ext4 options. My arguments:

    writeback mode of course provides best performance, because no journaling is used. This means, that after unclean shutdown you will get corrupted file system and no options to gracefully repair it. By default data=ordered is used, which means that, pseudo journaling is used. Ordered option is not slowest and safest one, but it can repair yourself in many conditions. In addition to data=ordered, commit=5 (by default) is used. Which means that data is pseudo-journaled every 5 seconds. In other words – if you lose your power, you will lose as much as the latest 5 seconds of work and your filesystem will not be damaged though, thanks to the journaling. If you want to speed up your disks, you don’t need to leave your system completely vulnerable just change commit value to as many seconds as you want to tradeoff beetween speed and lost last changes.

    Noatime is good option to speed up your system, but can break mail readers, like mutt and other applications, which need to know if a file was read after it was written. It’s because inode access times are not updated on file system.

    References:
    http://www.mjmwired.net/kernel/Documentation/filesystems/ext4.txt
    http://wiki.archlinux.org/index.php/Fstab

  • http://www.smartlogicsolutions.com/nick Nick Gauthier

    @Andrius

    Actually, you’re incorrect.

    http://linux.die.net/man/8/tune2fs

    journal_data_writeback does keep a journal. It is very similar to data=ordered,commit=5, except that instead of every 5 seconds, the time span is not defined. Therefore it can be done at the kernel’s discretion, yielding the best speedup.

    Much like data=ordered,commit=5, you will lose a certain amount of uncommitted data in case of a power outage or other hard crash. However you will not corrupt your system.

    What you mentioned about atime, however, is completely true. Thankfully I don’t use mutt or other mail readers.

    -Nick

  • Andrius

    I’m not completely sure, but it seems data=writeback and journal_data_writeback is not the same. Even data=ordered don’t use normal journaling. When data_writeback mode is selected only metadata journaling is used. It is not the same as data journaling. And there is no data journaling. So this mode can be destructive on system crash.

    journal_data_writeback is option, when full journaling is enabled with data=journal. data=journal is the slowest and safest mode. As you said journal_data_writeback keep a journal and gives best performance in data=journal mode.

  • Grogan

    Andrius says: “I’m not completely sure, but it seems data=writeback and journal_data_writeback is not the same.”

    Yes, it is. It is simply setting the default journal option in the superblock. The reason for that first step with tune2fs is in case that is your root filesystem. You cannot remount a filesystem and change the journal mode, it has to be completely dismounted first.

    In just about every distro (boot loader option) your filesystem will already be mounted read-only before fstab options come into play, and trying to change the journal mode will result in mount refusing to remount the filesystem, it will stay in read-only mode and your system will not boot properly. (you’ll be lucky to get a shell depending on distro and certainly, if you’re doing this remotely, you’d be buggered)

    “journal_data_writeback is option, when full journaling is enabled with data=journal. data=journal is the slowest and safest mode. As you said journal_data_writeback keep a journal and gives best performance in data=journal mode.”

    No, they are mutually exclusive. data=journal would override the journal_data_writeback setting in the superblock at mount time.

    You’re correct about the pseudo journaling though (metadata only, whether ordered or writeback modes). The only journal option that gives you real journaling is data=journal

    ———————————

    Nick, thanks for the blog post. I came across it while looking for mount options to improve ext4 performance. What I think is that you have to do backups anyway, so it’s better to use options that favour performance than data safety at the expense of performance. At least on my own workstations, anyway.

  • Pingback: Admin: Linux file server performance boost (ext4 version) | Cypris' lookout

  • James

    EXT4 also can be created without a journal. Full journaling on EXT4 turns off delayed allocation as does the mount option ‘nodelalloc’ with ordered mode makes it behave like EXT3.

    http://kernelnewbies.org/Ext4

  • http://ngauthier.com Nick Gauthier

    Cool, thanks James. Slightly more dangerous, but probably even better for performance.

  • http://www.noveda.com/ Joshua Dickerson

    Instead of rebooting, just use “mount -a”

  • http://mikebabcock.ca Michael T. Babcock

    This discussion is one of the reasons I partition my drives with more than one simple root partition; different file system parameters suit different types of data.

    For example, if you use two, a root and a /home filesystem, you can set your /root to be full journaled writes with noatime for fast reads but slow writes and almost no possibility of data corruption where your binaries live. Then your /home you can play a bit more fast and loose with if you like.

  • Pingback: Improve File-Systems Performance « XT Zone

  • Pingback: Install MySQL on SSD, best mount options? - Admins Goodies

  • LohPhat

    1. What is the default journal mode if the “journal_data” | “journal_data_ordered” | “journal_data_writeback” are not specified? Is it distro specific? If so, how do I find out?

    Is the /etc/fstab mount option “data=writeback” redundant if “journal_data_writeback” is set in the superblock?

  • everge48

    ^ Yes it is redundant.

  • Atillâ

    …a little bit offtopic, but if your test environment uses lots of small files, why don’t you try jfs with just “noatime” option in fstab (it’s a great FS for sendmail/postfix mail servers f.ex.), or xfs tweaked with “noatime,nodiratime,logbufs=8″ options in fstab?

    Within our vmsphere guests, we generally use xfs nowerdays with these tweaks – as long you don’t use RAID Disks within the guest (therefore you need different fs parameter optimizations).

    Especially the xfsdump package gives a lot of tools to do online maintenance on the FS, especially together with crond…

    I’m curious if those tweaked partitions might perform as well as your ext4 tweaked partition (btw. we adopted your tweak for ext4 partitions without barrier=0 & nobh options (don’t want to be on the dark side of the force))…

    Regards & thanks.

  • Marcus Sundberg

    Changing from the default strictatime setting to either relatime or noatime
    will significantly decrease the number of writes to the filesystem.

    I severely doubt that changing from relatime to noatime will make a
    noticable performance difference though. Basicly the only time when
    that should make a difference is if every other operation to a given
    file is a write and every other operation is a read. To me that sounds
    like an unlikely access pattern.

    //Marcus

  • Pingback: Ext4 ? Extra merdique v4 ? « linux aventure

  • swaroop

    I tried this out, guys, but it only slowed my device down. Copying the same file,

    With all the options: [real 1m58.010s]
    Without all the options: [real 1m53.522s]

    Try for yourselves. Thanks for the effort though.

  • http://doxsee.info Stephen

    For the record, I needed all the options (i.e. noatime,data=writeback,barrier=0,nobh,errors=remount-ro) to get any significant speedup on my Ubuntu VGN-NW110D laptop. Although, my overall test time went down from 3mins 42secs to just 12secs!

  • Jack

    Found this post while looking up other ext4 options. Tried it on a CentOS 6.4 x86_64 domU under XCP 1.6.10 with 4x Intel 530 SSDs in RAID10 (LSI 9271-8i w/CV) set to WriteBack w/BBU. Xen is utilizing ext, file-based VHD storage formatted using ext3.

    On VM, ext4 VHD without options – 523MB/s write speed
    On VM, ext4 VHD with options – 806MB/s write speed

    The XCP dom0 write speed was 624MB/s.

    Thumbs up.

  • Pingback: Improving MySQL insert performance | Freevudu

  • Pingback: Rspec performance redux | technpol

  • alt

    Do not forget to add rootflags=data=writeback to kernel options via grub config. On Ubuntu edit /etc/default/grub and change GRUB_CMDLINE_LINUX_DEFAULT variable to something like:

    GRUB_CMDLINE_LINUX_DEFAULT=”rootflags=data=writeback quiet splash”

    After that run update-grub.

  • Pingback: Optimizar Ext4 y Ext3 | Paco Rabadán

  • Adam Ward

    Hello,

    You state above that data=writeback writes the metadata out in a lazy manner. This is incorrect, please see man page for tune2fs:

    journal_data_writeback
    When the filesystem is mounted with journalling enabled, data may be written into the main filesystem after its metadata has been committed to the journal. This may increase throughput, however, it may allow old data to appear in files after a crash and journal recovery.

    This would imply that the meta data is written first. In the event of a system failure the meta data may updated however the file contents could be stale.

  • Pingback: ZFS iSCSI Benchmark Tests on ESX | VirtuallyHyper

  • Nathan Thern

    I go an almost 10x increase in write performance with these options … thanks! Another problem I had was that rsync would not recognize files that had already transferred. The new options fixed that.

  • Illia Rudenko

    It is better to use
    # mount -o remount /dev/sda