14

Consider a sparse file with 1s written to a portion of the file.

I want to reclaim the actual space on disk for these 1s as I no longer need that portion of the sparse file. The portion of the file containing these 1s should become a "hole" as it was before the 1s were themselves written.

To do this, I cleared the region to 0s. This does not reclaim the blocks on disk.

How do I actually make the sparse file, well, sparse again?

This question is similar to this one but there is no accepted answer for that question.

Consider the following sequence of events run on a stock Linux server:

$ cat /tmp/test.c
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
#include <string.h>

int main(int argc, char **argv) {
    int fd;
    char c[1024];

    memset(c,argc==1,1024);

    fd = open("test",O_CREAT|O_WRONLY,0777);
    lseek(fd,10000,SEEK_SET);
    write(fd,c,1024);
    close(fd);

    return 0;
}

$ gcc -o /tmp/test /tmp/test.c

$ /tmp/test

$ hexdump -C ./test
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00002710  01 01 01 01 01 01 01 01  01 01 01 01 01 01 01 01  |................|
*
00002b10

$ du -B1 test; du -B1 --apparent-size test
4096        test
11024       test

$ /tmp/test clear

$ hexdump -C ./test
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00002b10

$ du -B1 test; du -B1 --apparent-size test
4096        test
11024       test

# NO CHANGE IN SIZE.... HMM....

EDIT -

Let me further qualify that I don't want to rewrite files, copy files, etc. If it is not possible to somehow free previously allocated blocks in situ, so be it, but I'd like to determine if such is actually possible or not. It seems like "no, it is not" at this point. I suppose I'm looking for sys_punchhole for Linux (discussions of which I just stumbled upon).

Community
  • 1
  • 1
z8000
  • 3,715
  • 3
  • 29
  • 37
  • 1
    From what I've read of sparse files, the key determinant is not that the block is filled with 0s, but that it's never been written. Do you have any references to the contrary? – kdgregory Dec 30 '09 at 22:08
  • Portions of a sparse file never written to have no allocated blocks. But, my question is once I do allocate one or more blocks, how do I free them? I no longer need a portion of the sparse file and want to _give back_ previously allocated blocks. But I can't. Boo. – z8000 Dec 30 '09 at 22:13
  • `cp --sparse=always` ... sparse files are a hack based on the way systems manage storage; it's never a good idea to rely on a hack. If you need sparse data structures that are likely to have pieces come and go, I'd recommend looking for such a structure or writing it yourself. – kdgregory Dec 30 '09 at 22:20
  • I suppose you could also muck with the inode entries themselves ... haven't touched a filesystem at that level since the early 90s, so have nothing more to offer. – kdgregory Dec 30 '09 at 22:26

7 Answers7

11

It appears as if linux have added a syscall called fallocate for "punching holes" in files. The implementations in individual filesystems seem to focus on the ability to use this for pre-allocating a larger continous number of blocks.

There is also the posix_fallocate call that only focus on the latter, and is not usable for hole punching.

Christian
  • 9,417
  • 1
  • 39
  • 48
  • 1
    [Jim Paris from UNIX stackexchange wrote a script](http://unix.stackexchange.com/a/52029/4830) to resparsify a file in-place using this syscall. Here it is: https://gist.github.com/jimparis/3901942 – Vladimir Panteleev Apr 05 '15 at 10:01
  • 1
    Update from the linked UNIX.SE: _"as of util-linux 2.25, the fallocate utility on Linux has a -d/--dig-hole option for that."_ – gronostaj Dec 08 '20 at 18:50
4

Right now it appears that only NTFS supports hole-punching. This has been historically a problem across most filesystems. POSIX as far as I know, does not define an OS interface to punch holes, so none of the standard Linux filesystems have support for it. NetApp supports hole punching through Windows in its WAFL filesystem. There is a nice blog post about this here.

For your problem, as others have indicated, the only solution is to move the file leaving out blocks containing zeroes. Yeah its going to be slow. Or write an extension for your filesystem on Linux that does this and submit a patch to the good folks in the Linux kernel team. ;)

Edit: Looks like XFS supports hole-punching. Check this thread.

Another really twisted option can be to use a filesystem debugger to go and punch holes in all indirect blocks which point to zeroed out blocks in your file (maybe you can script that). Then run fsck which will correct all associated block counts, collect all orphaned blocks (the zeroed out ones) and put them in the lost+found directory (you can delete them to reclaim space) and correct other properties in the filesystem. Scary, huh?


Disclaimer: Do this at your own risk. I am not responsible for any data loss you incur. ;)

Sudhanshu
  • 2,691
  • 1
  • 18
  • 25
2

Ron Yorston offers several solutions; but they all involve either mounting the FS read-only (or unmounting it) while the sparsifying takes place; or making a new sparse file, then copying across those chunks of the original that aren't just 0s, and then replacing the original file with the newly-sparsified file.

It really depends on your filesystem though. We've already seen that NTFS handles this. I imagine that any of the other filesystems Wikipedia lists as handling transparent compression would do exactly the same - this is, after all, equivalent to transparently compressing the file.

James Polley
  • 7,977
  • 2
  • 29
  • 33
2

After you have "zeroed" some region of the file you have to tell to the file system that this new region is intended to be a sparse region. So in case of NTFS you have to call DeviceIoControl() for that region again. At least I do this way in my utility: "sparse_checker"

For me the bigger problem is how to unset the sparse region back :).

Regards

opal
  • 143
  • 1
  • 6
1

This way is cheap, but it works. :-P

  1. Read in all the data past the hole you want, into memory (or another file, or whatever).
  2. Truncate the file to the start of the hole (ftruncate is your friend).
  3. Seek to the end of the hole.
  4. Write the data back in.
C. K. Young
  • 219,335
  • 46
  • 382
  • 435
  • Ouch. So let me further qualify that I am looking for something that "scales" well. :) I don't want to rewrite files, copy files, etc. If it is not possible to somehow free previously allocated blocks in situ, so be it, but I'd like to determine if this is true or false. – z8000 Dec 30 '09 at 22:22
  • 1
    It depends on your filesystem. We've already seen that NTFS handles this. I imagine that any of the other filesystems [Wikipedia lists][1] as handling transparent compression would do exactly the same - this is, after all, equivalent to transparently compressing the file. [1] http://en.wikipedia.org/wiki/Comparison_of_file_systems#Allocation_and_layout_policies – James Polley Dec 30 '09 at 22:29
0

umount your filesystem and edit filesystem directly in way similar debugfs or fsck. usually you need driver for each used fs.

vitaly.v.ch
  • 2,485
  • 4
  • 26
  • 36
-1

Seems like writing zeros (as in the referenced question) to the part you're done with is a logical thing to try. Here a link to an MSDN question for NTFS sparse files that does just that to "release" the "unused" part. YMMV.

http://msdn.microsoft.com/en-us/library/ms810500.aspx

No Refunds No Returns
  • 8,092
  • 4
  • 32
  • 43