14

To my understanding, holes are perhaps maintained as metadata at inode, but actual disk is not filled with empty zeros.

  1. Can someone explain with real life usage examples, where holes in a file can be useful?

  2. Is holes same as soft preallocation? From diskusage perspective, even though actual disk space is not used, but that space is also not available for other process.

Jim Balter
  • 16,163
  • 3
  • 43
  • 66
Jimm
  • 8,165
  • 16
  • 69
  • 118
  • 3
    The question was closed before I finished typing my answer, so here goes:The real advantage of holes (in a VM scenario) is when you actually delete data from the virtual disk.Suppose you've used up the 20Gigs of the VM's disk space and you decide to delete some data.Without sparse file support, in-spite of the deletion, the 20Gigs still remain occupied in the underlying physical hard disk.But if the filesystem supports holes, then the VM can 'punch' a hole corresponding to the files deleted, thereby freeing up physical disk space.Hole punching is supported by fallocate() on some filesystems. – itisravi Dec 21 '12 at 05:00
  • 1
    This may help. [What is a sparse file and why do we need it?](https://stackoverflow.com/questions/43126760/what-is-a-sparse-file-and-why-do-we-need-it) – Rick Jun 17 '20 at 04:58
  • SO again closing perfectly valid questions... – étale-cohomology Oct 26 '21 at 12:56

2 Answers2

18

Files with holes are usually referred to as sparse files.

They are useful when a program needs to access a wide range of addresses (offset) but is unlikely to touch all of the potential blocks. This can be used by virtualization products to store virtual disks. Let's say you configure a virtual machine with a 20 GB disk but it won't be full of data quickly. It is much faster to create a 20 GB sparse file that will only use a couple of disk blocks at the beginning and then have the VM creating a file system and storing files at a low pace.

A large sparse file can also have its size reduced when some of its blocks are blanked (i.e. filled with null bytes). The sparse file aware program doing it can, instead of actually writing to the blocks, remove them from the file (i.e. punch holes in the file) with the very same effect because unallocated blocks are returning zeroes when read by a program.

Sparse files are the opposite of preallocation, they are what is called thin provisioning or might also be called disk overcommitment. This allows creating more "virtual disk space" than the actual hardware supports and add more disk to grow the file system only when necessary.

jlliagre
  • 29,783
  • 6
  • 61
  • 72
  • Wikipedia seems to imply the opposite of your last paragraph. From what wikipedia describes, it does not sound over commitment, rather preallocation. For example, Disadvantages are that sparse files may become fragmented; file system free space reports may be misleading. filling up file systems containing sparse files can have unexpected effects (such as disk-full or quota-exceeded errors ....http://en.wikipedia.org/wiki/Sparse_file – Jimm Dec 21 '12 at 00:38
  • 1
    I still dont get advantage of sparse files in the context of VM. Why not simply grow the file on need basis. For example if user requested upto 20GB of space for VM, preallocate 1GB. At some threshold of actual usage, preallocate more. – Jimm Dec 21 '12 at 00:44
  • Well, the fact the files may become fragmented and lead to unexpected disk full situation is precisely due to the fact their space was not allocated. If they were preallocated, there would have been no fragmentation. Sparse files are definitely overcommitment and that the opposite of what you mention in your question: space is also not available for other processes. With sparse files, space is available for other files, there is no reservation at all. – jlliagre Dec 21 '12 at 00:45
  • You are still missing the point. Grow on the need basis is precisely what sparse files are providing. – jlliagre Dec 21 '12 at 00:47
  • 2
    you can increase existing file size of a non sparse file anytime, as long as there is disk space. So, what is creating a sparse file buy me? Would it reserve block address? It sounds like, it does not reserve anything, so then i am wondering what is the purpose of creating one? – Jimm Dec 21 '12 at 00:52
  • 3
    Yes you kind of reserve addresses, but this reservation takes (almost) no disk space. The advantage is for the virtualized OS to immediately see a large disk and be able to create properly dimensioned partitions on it and then layout file systems in these partitions in a very economical manner. Should you choose the non sparse file way, you would not be able to have more than one growable partition (without using volume management if available) and enlarging the file systems would add pointless administrative burden. – jlliagre Dec 21 '12 at 01:03
1

Holes are "useful" in the sense that they reduce disk space use (they make more space available). They aren't use able in any other sense. The existence of holes as part of a filesystem representation is "useful" when one has sparse files that contain large blocks of zeroes.

Holes don't have anything to do with pre-allocation. Pre-allocation makes space available on the disk for data in a file before the file actually has that data. Holes are a representation of data ... specifically of blocks consisting solely of zeroes.

Jim Balter
  • 16,163
  • 3
  • 43
  • 66