1

NTFS does support sparse files, but I want to make sure the files I have to write to (which might have been created, set as sparse, and partially filled by another application) are fully allocated, so that I won't get an error due to lack of space when writing to the middle of such file at a later time (i.e. if they are to happen, out of space errors should happen now).

Is there a WinAPI function to ensure a sparse file is fully allocated (preferably atomically), like we have posix_fallocate() in POSIX systems? If not, how do I preallocate it?

I don't think these are duplicates:

lvella
  • 12,754
  • 11
  • 54
  • 106
  • my be [`SetFileValidData`](https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-setfilevaliddata) (internal this use [`FileValidDataLengthInformation`](https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/5c9f9d50-f0e0-40b1-9b84-0b78f59158b1) ) – RbMm Jun 10 '23 at 16:26
  • https://github.com/rclone/rclone/issues/4245 – RbMm Jun 10 '23 at 16:29
  • 1
    Hmm, *sparse files* and *fully allocated* are mutually exclusive. Pick one. – Eljay Jun 10 '23 at 16:40
  • @RbMm per the link you provided for `FileValidDataLengthInformation`, if the file is sparse I get a `STATUS_INVALID_PARAMETER` error. – lvella Jun 10 '23 at 16:41
  • @Eljay They are not. A file can have the `FILE_ATTRIBUTE_SPARSE_FILE` set, and still happens to be fully allocated. – lvella Jun 10 '23 at 16:42
  • Then just make sure that is, indeed, the case. – Eljay Jun 10 '23 at 16:44
  • @Eljay My question is specifically how. If the file doesn't have the sparse attribute, I'm good. If it does have, what do I do? – lvella Jun 10 '23 at 16:45
  • *"If the file is sparse I get a `STATUS_INVALID_PARAMETER` error."* - That is to be [expected](https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-setfilevaliddata): *"The file cannot be a network file, or be compressed, **sparse**, or transacted."* – IInspectable Jun 10 '23 at 16:46
  • @IInspectable Indeed, so that can't possibly be the answer to my question, as I am asking specifically how to deal with sparse files. – lvella Jun 10 '23 at 16:50
  • The answer is: fully allocate the file, don't use sparse files. – Eljay Jun 10 '23 at 17:08
  • @Eljay I can't, I get the file from the user. – lvella Jun 10 '23 at 17:09
  • *sparse file is fully allocated* - this is how ? in this case it already not sparsed file – RbMm Jun 10 '23 at 18:20
  • @RbMm the full phrase says "I want to make sure the files I have to write to (...) are fully allocated." So sure, in the end, I want a non-sparse file, but I start with a sparse file. – lvella Jun 10 '23 at 18:25
  • But in regarding NTFS, the term "sparse file" is overloaded. When Microsoft docs says a file is sparse, it doesn't necessarily mean it has holes in its allocation. Instead, it means the file has a flag set saying that it has the ability of being sparse, so that you may increase its size or poke holes in it to have non-allocated pages. Without this flag, the file is always non-sparse. I don't care if the file has the flag or not, but I do want it to not have any holes. – lvella Jun 10 '23 at 18:28
  • `FSCTL_SET_SPARSE` you can use for convert file to not sparse state - based on `SetSparse` true of false – RbMm Jun 10 '23 at 18:30
  • @RbMm It can't be used to unset the flag. From the docs: "Passing FALSE in the FILE_SET_SPARSE_BUFFER structure will cause this function call to fail. The only way to clear this attribute is to overwrite the file (for example, by calling the CreateFile function with the CREATE_ALWAYS flag)." And as I said, I couldn't care less for the flag, I just want to fill the holes. – lvella Jun 10 '23 at 18:32
  • 2
    Copying the file would be an option. If you need to allocate disk space, in-place, I wouldn't know of a way other than issuing an `FSCTL_QUERY_ALLOCATED_RANGES` ioctl, followed by a write to every unallocated region. – IInspectable Jun 10 '23 at 18:42
  • more exactly *A value of FALSE for this member is valid only on files that no longer have any sparse regions*, you can qury ranges and fill empty with 0 – RbMm Jun 10 '23 at 18:44
  • https://stackoverflow.com/questions/7970333/how-do-you-pre-allocate-space-for-a-file-in-c-c-on-windows – Hans Passant Jun 10 '23 at 19:01
  • What's the point of this? To get one particular error instead of another? You could encounter an error writing the file even if it isn't sparse. Seems like a lot of hand-wringing for minimal benefit. Just write to the file, and if an error occurs then handle it appropriately. – Luke Jun 11 '23 at 14:24
  • @Luke Because the file will be memory mapped, so if I get an error trying to allocate the full file, I can handle it (tell the user the disk is full). But if I try to write to a memory address corresponding to a sparse region of a file, the error will happen inside the page fault that tries to allocate it, killing my process. – lvella Jun 11 '23 at 15:02
  • Can you not catch this exception and handle it? Last paragraph: https://learn.microsoft.com/en-us/windows/win32/memory/reading-and-writing-from-a-file-view – Luke Jun 12 '23 at 05:32
  • @Luke I am writing a library, but my users can certainly do it. Still, the best place for the application to react to the out of storage errors is where I am trying to do it, so the end user can do something about it. – lvella Jun 12 '23 at 14:47
  • There's nothing stopping you from catching this in a library. But yes, running out of disk space is awkward to deal with regardless of where it occurs. – Luke Jun 12 '23 at 17:30

1 Answers1

0

Following the link from this documentation page, I could think of 3 ways of pre-allocating the sparse ranges of a file, but none are atomic, like posix_fallocate(). I was hoping someone could point to an existing solution in the WinApi.

Here they are:

Just copy the file

Copy the full file to another, delete the old file, then rename. This approach has the drawback of always being slow, as it has to read and write the whole file, and potentially takes twice the file space on disk.

It could be improved a little by checking the FILE_ATTRIBUTE_SPARSE_FILE, so you can skip the operation if the file can't be sparse.

Copy the file inplace

Open the file twice, once for reading and once for writing, and alternate between reading from one side and writing to the other, until the whole file has been rewritten. The performance is as bad as the first solution, but at least doesn't take more space than the full file size.

This (maybe) can be improved by reading and writing only one byte per cluster (if you know the cluster size), because the whole cluster have to be allocated. Allocated clusters will keep the old value, and new clusters will be automatically filled with the default value. I say maybe because writing one byte or one full cluster is the same for the NTFS layer, so maybe it is not worth the extra system calls to fseek() the file.

Write zeros to the sparse region

As suggested in the comments of the question, you can use FSCTL_QUERY_ALLOCATED_RANGES to figure out the ranges where the file are allocated, and write zeros to the space between them. Actually, I've read somewhere that the default read value for unallocated ranges is not necessarily zero, so, to be safe, in my implementation I read one byte from one of those regions and use this value to write back to the spaces between allocations.

Again, only one byte per cluster is sufficient.

Depending on how much of the file is allocated, the performance can be much better than the other methods.

lvella
  • 12,754
  • 11
  • 54
  • 106