Randomized-offset binary raw disk writes with no caching whatsoever

Question

For my application, I am attempting to determine whether a data backup system missed any writes. I am doing this by writing an incrementing integer counter to a 1GB virtual disk, and to make sure no writes were missed I can look at the reverted snapshot and see if there were any gaps (i.e. if I see 1, 2, 3, 0, 0, 6, 7 I know that the backup didn't get writes 4 and 5 correctly). This is all on a CentOS 7 VM, with mostly Python 2.7 scripts for writes/reads (speed isn't a huge concern)

A big part of my issues has been caching: since I'm simulating random I/O, writes are often flushed from caches and written to disk out of order. This makes every test appear as a false positive, since it looks like some data is missing at the time of the snapshot. Again, I don't really care about efficiency at all, so I don't mind really slow writes. Reads can use caching, that's not a problem, but also doesn't matter much one way or the other

Here are the things I have done to try to disable caching:

disable the disk write cache with sudo hdparm -W 0 /dev/sdb where /dev/sdb
writing to a raw disk with no filesystem, so no filesystem caching
set the buffering flag on with open in the Python script to 0 (no Python write cache)

Is it basically an impossible task to make sure that my writes get put on the disk in sequential order? All I need is write #(n) to happen before write #(n+1), and #(n+1) before #(n+2), etc.

This is the Python script I'm using to write to disk (SIZE and PRIME change based on the size of the disk an a random seed):

from struct import pack, unpack
import sys
SIZE,PRIME = [x],[x]
# random I/O traversal iterator
def rand_index_generator(a,b):
    ctr=0
    while True:
        yield (ctr%b)
        ctr+=a

with open('/dev/sdb', 'rb+', buffering=0) as f:
    index_gen = rand_index_generator(PRIME, SIZE)
    # random traversal using iterator above, write counter to file
    for counter in xrange(1, SIZE-16):
        f.seek(index_gen.next()*4)
        f.write(pack('>I', counter))

Then to validate I traverse in the same order and watch for gaps of unwritten data. This is after reverting the VM back to the snapshot. I know all the traversal and writing things work since validation will work smoothly with no missed writes before reverting, but I think some "written" data dies in RAM and doesn't make it to disk

Will take any suggestions to guarantee the write order I need for this application

I think your ctr repeates and partially or entirely overwrites ... your missing something even with buffering this is certainly writing and doing actions in the order you do them ... — Joran Beasley, Apr 26 '22 at 06:04
your code throws an error ... presumably you meant `SIZE,PRIME = [x,x]` in which case you are always getting 0 back from your iterator — Joran Beasley, Apr 26 '22 at 06:05
im pretty sure your actual issue is with rand_index_generator — Joran Beasley, Apr 26 '22 at 06:08
There is no issue with the random generator. Everything works 100% fine before I restore the disk, then I start seeing missing blocks. The [x],[x] is a placeholder because I set that value before this function is entered. Was trying to be concise — Henry Dikeman, Apr 26 '22 at 14:55
you havent given us enough to reproduce the problem ... please provide the actual values you use ... — Joran Beasley, Apr 27 '22 at 02:09
@JoranBeasley thank you for attempting an answer. This was a tough question to ask since the problem arises only with the snapshot program that I was testing (hard to replicate). Included an answer in case anyone else wants to know how to fully bypass OS caches to guarantee write order. Cheers — Henry Dikeman, Apr 28 '22 at 07:08

score 1 · Accepted Answer · answered Apr 28 '22 at 07:03

Found out the answer to this question. I misunderstood the effect of writing to a raw disk, it did not eliminate OS caching since I was still calling the OS to write to my raw disk. Oops

To bypass OS caches you should use os.open and pass os.O_DIRECT and os.O_SYNC flags to make sure writes happen in the correct sequence (more info on those flags) and are not stuck in volatile memory. I used mmap and os file descriptors but you could also use the normal filehandles like this

Page size is specific to your operating system. For Linux it is 4096

The top section of the code stayed the same but here is the write loop:

PAGESIZE = 4096
filedesc = os.open('/dev/sdb', os.O_DIRECT|os.O_SYNC|os.O_RDWR)
for counter in xrange(1, SIZE-16):
    write_loc = index_gen.next()*4
    page_dist = (write_loc%PAGESIZE)
    offset = write_loc - page_dist
    bytemap = mmap.mmap(filedesc, PAGESIZE, offset=offset)
    bytemap[page_dist:(page_dist+4)] = pack('>I', counter)
    bytemap.flush()
    bytemap.close()

Randomized-offset binary raw disk writes with no caching whatsoever

1 Answers1