Simulate a faulty block device with read errors?

Question

I'm looking for an easier way to test my application against faulty block devices that generate i/o read errors when certain blocks are read. Trying to use a physical hard drive with known bad blocks is a pain and I would like to find a software solution if one exists.

I did find the Linux Disk Failure Simulation Driver which allows creating an interface that can be configured to generate errors when certain ranges of blocks are read, but it is for the 2.4 Linux Kernel and hasn't been updated for 2.6.

What would be perfect would be an losetup and loop driver that also allowed you to configure it to return read errors when attempting to read from a given set of blocks.

In addition to the answers see the [list of Linux disk fault injection mechanisms](https://unix.stackexchange.com/a/144200/61610) over on the [Special File that causes I/O error Unix & Linux question](https://unix.stackexchange.com/questions/77492/special-file-that-causes-i-o-error/144200#144200). — Anon, Oct 24 '17 at 20:07

Peter Cordes · Accepted Answer · 2021-02-04T21:18:12.383

30

It's not a loopback device you're looking for, but rather device-mapper.

Use dmsetup to create a device backed by the "error" target. It will show up in /dev/mapper/<name>.

Page 7 of the Device mapper presentation (PDF) has exactly what you're looking for:

dmsetup create bad_disk << EOF
  0 8       linear /dev/sdb1 0
  8 1       error
  9 204791 linear /dev/sdb1 9
EOF

Or leave out the sdb1 parts to and put the "error" target as the device for blocks 0 - 8 (instead of sdb1) to make a pure error disk.

See also The Device Mapper appendix from "RHEL 5 Logical Volume Manager Administration".

There's also a flakey target - a combo of linear and error that sometimes succeeds. Also a delay to introduce intentional delays for testing.

edited Feb 04 '21 at 21:18

answered Dec 09 '09 at 01:11

Peter Cordes

328,167
45
605
847

Worked perfectly -- just what I needed. Thanks! – SteveG Dec 10 '09 at 01:03
The PDF above doesn't explain (at least I didn't understand) the command line syntax for "dmsetup create". The man page explains it's: dmsetup create dev_name dev_mapper_table. The second arg (dev_mapper_table) is a text file that that describes how blocks are mapped. The dmsetup man page is terse and doesn't explain the syntax of this table. Here's a link that explains it... [link](https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Logical_Volume_Manager_Administration/device_mapper.html#dm-mappings) – Roger Heathcote Aug 25 '12 at 18:52
Will the disk "error" resolve itself after I restart the machine? – Maksim Nov 08 '18 at 17:08
@Maksim: Even without restarting, you can still read from `/dev/sdb1` directly. But yes, unless you put that `dmsetup` command or equivalent config in a startup file, that virtual device which gives an error-injected view of `/dev/sdb1` won't survive a reboot. – Peter Cordes Nov 08 '18 at 21:07
@PeterCordes what I meant is this: How do you undo this once I forcefully caused bad disk? Will the restart of the machine/vm fix the "error" by making my volume ReadWrite? – Maksim Nov 08 '18 at 21:52
@Maksim: This *doesn't* cause a bad disk. To do that, you'd use `hdparm --make-bad-sector`, which would actually affect your hardware. ([The man page](http://man7.org/linux/man-pages/man8/hdparm.8.html) warns that *that* is very dangerous.) Creating a new virtual block device with device mapper doesn't create errors on the underlying device. – Peter Cordes Nov 09 '18 at 00:21

score 5 · Answer 2 · answered Apr 27 '13 at 20:48

5

It seems like Linux's built-in fault injection capabilities would be a good idea to use.

Blog: http://blog.wpkg.org/2007/11/08/using-fault-injection/
Reference: https://www.kernel.org/doc/Documentation/fault-injection/fault-injection.txt

answered Apr 27 '13 at 20:48

David Foster

6,931
4
41
42

1

While links are good for further reading, you should explicitly present a concrete answer here, I guess. – U. Windl May 05 '17 at 11:12
It also depends which kernel you are using: For example in SLES11 SP4 kernel the feature is not compiled in. – U. Windl May 08 '17 at 12:59

score 4 · Answer 3 · answered Dec 09 '09 at 04:02

The easiest way to play with block devices is using nbd.

Download the userland sources from git://github.com/yoe/nbd.git and modify nbd-server.c to fail at reading or writing on whichever areas you want it to fail on, or to fail in a controllably random pattern, or basically anything you want.

score 0 · Answer 4 · answered Jun 25 '20 at 19:13

I would like to elaborate on Peter Cordes answer.

In bash, setup an image on a loopback device with ext4, then write a file to it named binary.bin.

imageName=faulty.img
mountDir=$(pwd)/mount

sudo umount $mountDir ## make sure nothing is mounted here

dd if=/dev/zero of=$imageName bs=1M count=10
mkfs.ext4 $imageName
loopdev=$(sudo losetup -P -f --show $imageName); echo $loopdev
mkdir $mountDir
sudo mount $loopdev $mountDir
sudo chown -R $USER:$USER mount

echo "2ed99f0039724cd194858869e9debac4" | xxd -r -p > $mountDir/binary.bin

sudo umount $mountDir

in python3 (since bash struggles to deal with binary data) search for the magic binary data in binary.bin

import binascii

with open("faulty.img", "rb") as fd:
    s = fd.read()
    
search = binascii.unhexlify("2ed99f0039724cd194858869e9debac4")

beg=0
find = s.find(search, beg); beg = find+1; print(find)

start_sector = find//512; print(start_sector)

then back in bash mount the faulty block device

start_sector=## copy value from variable start_sector in python
next_sector=$(($start_sector+1))
size=$(($(wc -c $imageName|cut -d ' ' -f1)/512))
len=$(($size-$next_sector))

echo -e "0\t$start_sector\tlinear\t$loopdev\t0" > fault_config
echo -e "$start_sector\t1\terror" >> fault_config
echo -e "$next_sector\t$len\tlinear\t$loopdev\t$next_sector" >> fault_config

cat fault_config | sudo dmsetup create bad_drive
sudo mount /dev/mapper/bad_drive $mountDir

finally we can test the faulty block device by reading a file

cat $mountDir/binary.bin

which produces the error:

cat: /path/to/your/mount/binary.bin: Input/output error

clean up when you're done with testing

sudo umount $mountDir
sudo dmsetup remove bad_drive
sudo losetup -d $loopdev
rm fault_config $imageName

Simulate a faulty block device with read errors?

4 Answers4

Linked