If I create a file, use lseek(2)
to jump to a high position in the (empty) file, then write some valuable information there, I create a sparse file on Unix system (probably depending on the file system I use, but let's assume I'm using a typical Unix file system like ext4 or similar, there this is the case).
If I then lseek(2)
to an even higher position in the file, write something there as well, I end up with a sparse file which contains somewhere in its middle the valuable information, surrounded by a huge amount of sparse file. I'd like to find this valuable information within the file without having to read it completely.
Example:
$ python
f = open('sparse', 'w')
f.seek((1<<40) + 42)
f.write('foo')
f.seek((1<<40) * 2)
f.write('\0')
f.close()
This will create a 2TB file which uses only 8k of disk space:
$ du -h sparse
8.0K sparse
Somewhere in the middle of it (at 1TB + 42 bytes) is the valuable information (foo
).
I can find it using cat sparse
of course, but that will read the complete file and print immense amounts of zero bytes. I tried with smaller sizes and found that this method will take about 3h to print the three characters on my computer.
The question is:
Is there a way to find the information stored in a sparse file without reading all the empty blocks as well? Can I somehow find out where empty blocks are in a sparse file using standard Unix methods?