11

Let's say that the intent is to create a file with a large hole at the beginning that we will write to later on, on an embedded device running Linux. We open the file, obtain a file descriptor and call lseek on it to seek to a certain, known position. Afterwards, when we want to write to that file at the seeked-to position, we call write on it.

However, on the first write the hole created by seeking gets zero-filled and if the hole is large enough, this operation can take some time. In my application, there is no need for this zero initialization, as the hole is of exact length and I will fill it with my data later.

Is there a way to avoid having the first write call after seek zero-fill the hole (even if it involves modifying the filesystem driver)? Alternatively, is there a way of writing to a file before the beginning of the file (appending to the front of the file)?

Sam Protsenko
  • 14,045
  • 4
  • 59
  • 75

4 Answers4

8

This is likely related to your filesystem. On ext2/3/4, reiser, btrfs, xfs, and the like, doing what you describe should not take a long time, because they support what are called "sparse files" (files which take up less space in the underlying storage than the size of the file, because runs of zeroes aren't physically stored).

You might try an experiment with dd to make sure this is the case:

$ dd if=/dev/zero of=whatever bs=1k seek=1073741824 count=1
1+0 records in
1+0 records out
1024 bytes (1.0 kB) copied, 9.1878e-05 s, 11.1 MB/s
$ ls -al whatever
-rw-r--r-- 1 xxxx xxxx 1099511628800 Jan 31 18:04 whatever
$ du -h whatever
16K whatever

On your filesystem, that probably fails. If so, and what you need is to create a sparse file, determine whether you can use a different filesystem.

the paul
  • 8,972
  • 1
  • 36
  • 53
  • You are right. In my case, running `dd if=/dev/zero of=/mnt/mmc/whatever bs=1k seek=100000` takes around 10 seconds, and the resultant file size is 97.7M. Filesystem is vfat, and unfortunately I cannot change this, since I'm operating on an SD-card, which should be readable across multiple systems. – Nebojsa Mrmak Feb 01 '16 at 00:33
  • Yep, none of the FAT filesystems support any sort of sparse file storage that I know of. There is probably no way to get exactly what you are asking for, but perhaps there is a good solution to the actual problem you're trying to solve. – the paul Feb 01 '16 at 00:36
  • 1
    @NebojsaMrmak: Take a look at this [other answer](http://stackoverflow.com/a/4396912/865874), HTH. – rodrigo Feb 01 '16 at 00:40
  • @rodrigo: Thank you, that might help. I'll see if modifying the driver could work, but more importantly, if `posix_fallocate()` call could be used or modified to "background" these zero-byte writes. – Nebojsa Mrmak Feb 01 '16 at 00:48
  • @NebojsaMrmak: You could try calling `posix_fallocate()` from a thread or a subprocess. Maybe it will create a race with the `write()`... maybe it locks the file, so the `write()` has to wait... maybe it works... – rodrigo Feb 01 '16 at 07:24
1

However, on the first write the hole created by seeking gets zero-filled and if the hole is large enough, this operation can take some time.

No it can't. It will just write the data you provided to write(). The zeros in the unwritten portion aren't physically there: they are an artefact of the file system.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • While your comment is true, it does not say whether there is a way around this behaviour? What I am looking for is a workaround around this fact. – Nebojsa Mrmak Feb 01 '16 at 00:50
  • If my *answer* is true, there is no behaviour to work around. The zero-byte writes you're asking about are imaginary. Or else my answer is false. You can't have it both ways. – user207421 Feb 01 '16 at 05:22
  • 1
    Inaccurate: not all filesystems support sparse files. OP is using vfat, which definitely does not. – the paul Feb 01 '16 at 06:38
0

This may not be a feasible solution for your use case for various reasons, but I can imagine splitting the large file into serially numbered chunks. A missing or zero-sized chunk is supposed to contain zeroes (or some other fixed value). Choose the chunk size to fit the space you want to reserve and to get a good compromise between file size and number of chunks.

Or make it a bit more complicated and use variable chunk sizes, with the "virtual" size of the individual chunk stored somewhere else. Given a complex enough numbering system you can even insert new chunks without renaming the subsequent chunk files...

Of course you'll need an additional access layer to do the de-chunking, either in your application code if that's sufficient, or worst-case as kernel driver hooking into the file handling.

Murphy
  • 3,827
  • 4
  • 21
  • 35
-1

Have you tried to use the flag MAP_UNINITIALIZED ?

Soren
  • 14,402
  • 4
  • 41
  • 67
  • 1
    According the linked man page, this flag is only available for anonymous memory, that is memory that is not from a file. – rodrigo Feb 01 '16 at 00:36