Using fault injection

Lately, I was playing with logfs, a scalable flash filesystem. It’s interesting because of at least three reasons:

  • it can compress data transparently,
  • it provides wear levelling,
  • it works with block devices.

In other words, a perfect filesystem to use for all flash-based devices, like USB-sticks.

Any filesystem should aim for high reliability, no doubt about it. But how to test it? Linux kernel provides a nice feature called “fault injection”. If Documentation/fault-injection/fault-injection.txt isn’t clear for you after the first read, here’s a quick help.

You need fault injection and debugfs compiled in the kernel.

First, mount debugfs:

mount debugfs /debug -t debugfs
mount -t logfs /dev/sdb1 /mnt/2

Then, set up fault injection:

cd /debug/fail_make_request
echo 10 > interval # interval
echo 100 > probability # 100% probability
echo -1 > times # how many times: -1 means no limit

Mount the filesystem on the device, start copying files etc. and then, make the device fail:

echo 1 > /sys/block/sdb/sdb1/make-it-fail

Logfs was working for some time, but eventually, it failed - we can blame a proprietary nvidia module which tainted the kernel, of course :)

FAULT_INJECTION: forcing a failure
[<c0104d19>] show_trace_log_lvl+0×1a/0×2f
[<c01057b1>] show_trace+0×12/0×14
[<c0105837>] dump_stack+0×15/0×17
[<c01e87d8>] should_fail+0×113/0×144
[<c01d40f2>] generic_make_request+0×167/0×2c0
[<c01d6142>] submit_bio+0xd7/0xdf
[<c01837e6>] submit_bh+0xbc/0xd9
[<c0185e84>] block_read_full_page+0×26c/0×27c
[<c0187823>] blkdev_readpage+0xf/0×11
[<c014aecb>] read_cache_page_async+0×96/0×129
[<c014c903>] read_cache_page+0×12/0×42
[<c01c3aca>] bdev_read+0×5d/0xcd
[<c01bd9f8>] scan_segment+0×16e/0×331
[<c01bdbe8>] logfs_scan_pass+0×2d/0×71
[<c01bdce9>] logfs_gc_pass+0×44/0×3f3
[<c01c07a0>] logfs_get_wblocks+0×21/0×2f
[<c01c1751>] logfs_write_buf+0×22/0×30b
[<c01bd790>] logfs_commit_write+0xed/0×101
[<c014c11f>] generic_file_buffered_write+0×3a9/0×522
[<c014c6d3>] __generic_file_aio_write_nolock+0×43b/0×471
[<c014c76d>] generic_file_aio_write+0×64/0xc1
[<c01672f6>] do_sync_write+0xc4/0×101
[<c0167af5>] vfs_write+0xaf/0×138
[<c0167fe2>] sys_write+0×3d/0×61
[<c0103da2>] sysenter_past_esp+0×5f/0×99
=======================
Buffer I/O error on device sdb1, logical block 35497
————[ cut here ]————
kernel BUG at fs/logfs/gc.c:88!
invalid opcode: 0000 [#1]
PREEMPT
Modules linked in: iptable_nat nf_nat ipt_ULOG ipt_recent nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables capability usblp commoncap loop dm_mod video thermal sbs fan container dock battery ac floppy cpufreq_conservative cpufreq_powersave processor pcspkr kqemu af_packet usb_storage snd_pcm_oss snd_mixer_oss snd_via82xx snd_ac97_codec ac97_bus snd_pcm ehci_hcd snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore nvidia(P) uhci_hcd evdev i2c_viapro via_agp agpgart i2c_core usbcore 8139cp via_rhine 8139too sr_mod mii tsdev sg cdrom
CPU: 0
EIP: 0060:[<c01bd9fc>] Tainted: P VLI
EFLAGS: 00010282 (2.6.22.5-1 #2)
EIP is at scan_segment+0×172/0×331
eax: fffffffb ebx: c02cfa98 ecx: c1809d80 edx: 00000000
esi: 08aa0000 edi: c749fbdc ebp: c749fc1c esp: c749fb8c
ds: 007b es: 007b fs: 0000 gs: 0033 ss: 0068
Process rsync (pid: 3940, ti=c749e000 task=d79574e0 task.ti=c749e000)
Stack: 0000001c c749fbdc 0000031a 00000000 0000899c f695dff4 00000013 00000000
00000455 dd4c1c00 00020000 00000000 022cfa98 00009114 00009114 00000000
00000455 00000000 0000101c d7e7c800 5ccc4658 00040010 00000000 13000000
Call Trace:
[<c0104d19>] show_trace_log_lvl+0×1a/0×2f
[<c0104dc9>] show_stack_log_lvl+0×9b/0xa3
[<c0104fa8>] show_registers+0×1d7/0×30c
[<c01051db>] die+0xfe/0×1d6
[<c02bddd7>] do_trap+0×89/0xa2
[<c0105605>] do_invalid_op+0×88/0×92
[<c02bdbb2>] error_code+0×6a/0×70
[<c01bdbe8>] logfs_scan_pass+0×2d/0×71
[<c01bdce9>] logfs_gc_pass+0×44/0×3f3
[<c01c07a0>] logfs_get_wblocks+0×21/0×2f
[<c01c1751>] logfs_write_buf+0×22/0×30b
[<c01bd790>] logfs_commit_write+0xed/0×101
[<c014c11f>] generic_file_buffered_write+0×3a9/0×522
[<c014c6d3>] __generic_file_aio_write_nolock+0×43b/0×471
[<c014c76d>] generic_file_aio_write+0×64/0xc1
[<c01672f6>] do_sync_write+0xc4/0×101
[<c0167af5>] vfs_write+0xaf/0×138
[<c0167fe2>] sys_write+0×3d/0×61
[<c0103da2>] sysenter_past_esp+0×5f/0×99
=======================
Code: 00 00 00 0f a5 f7 d3 e6 f6 c1 20 0f 45 fe 0f 45 f0 8b 45 94 89 f9 89 f2 8d 7d c0 03 55 a8 13 4d ac 89 7c 24 04 ff 13 85 c0 74 04 <0f> 0b eb fe 8d 45 c0 b9 1c 00 00 00 ba ff 00 00 00 e8 82 55 00
EIP: [<c01bd9fc>] scan_segment+0×172/0×331 SS:ESP 0068:c749fb8c

Hopefully, the bug is not there anymore, as I notified the logfs’ author, Jörn Engel. He also fixed some other bugs I found.

I wonder if and how easy other filesystems can be abused with the fault injection?

One Comment

  1. Kristina:

    Hopefully the bug has gone for good. I think it is relativley easy for other file systems to be abused with the fault injection.

Leave a comment