22

File holes are the empty spaces in file, which, however, doesn't take up any disk space and contains null bytes. Therefore, the file size is larger than its actual size on disk.

However, I don't know how to create a file with file holes for experimenting with.

Amumu
  • 17,924
  • 31
  • 84
  • 131
  • Hmmmm I'm not getting yer drift here. A hole is a place where a file has nothing, so the file doesn't have anything, but the file size is larger than the space on disk? Say your file is 100 bytes long, but each of those bytes is null (as in binary 0's) shouldn't it take 100 bytes of space on the disk? – Zeke Hansell Mar 15 '11 at 17:21
  • @Zeke: Naively, yes. But some file systems include an optimization to store a file with lots of consecutive null bytes and avoid storing all those zeros physically. Don't ask me how the details work, I guess it's crammed in some file attributes. –  Mar 15 '11 at 17:23
  • Do you mean the opposite, a file with actual size on disk larger than its size (which it would have if it was saved optimally)? – ypercubeᵀᴹ Mar 15 '11 at 17:24
  • If this is the case, then you CAN'T create a sparse file. The sparse file gets created by the operating system based upon the data in the file. – Zeke Hansell Mar 15 '11 at 17:26
  • 2
    @Zeke Hansell: You're mistaken. There is a way to create such a file. Read the answers. :-) – Omnifarious Mar 15 '11 at 17:36
  • @Omnifarious - Yes, but the fact that the file has the hole is a function of the file system, not the program that created it. I think if you tried that using CYGWIN on windows you would probably NOT have the desired effect. – Zeke Hansell Mar 15 '11 at 17:41
  • @Omnifarious - FYI I just tried the solution given below using CYGWIN and got two files that are exactly the same size, and the size on disk reported by windows is the same as the file size. – Zeke Hansell Mar 15 '11 at 17:48
  • @Zeke Hansell: Yes, it's true that it's a function of the OS. But all of the major Unixes do this. And the question is tagged Linux. :-) – Omnifarious Mar 15 '11 at 17:49
  • 1
    @Omni - point taken. I don't currently have a unix system in front of me. The best I can manage is a poor imitation ;-) – Zeke Hansell Mar 15 '11 at 17:57

5 Answers5

42

Use the dd command with a seek parameter.

dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes

That creates for you a file with a nice hole from byte 8192 to byte 28671.

Here's an example, demonstrating that indeed the file has holes in it (the ls -s command tells you how many disk blocks are being used by a file):

$ dd if=/dev/urandom bs=4096 count=2 of=fwh # fwh = file with holes
2+0 records in
2+0 records out
8192 bytes (8.2 kB) copied, 0.00195565 s, 4.2 MB/s

$ dd if=/dev/urandom seek=7 bs=4096 count=2 of=fwh
2+0 records in
2+0 records out
8192 bytes (8.2 kB) copied, 0.00152742 s, 5.4 MB/s

$ dd if=/dev/zero bs=4096 count=9 of=fwnh # fwnh = file with no holes
9+0 records in
9+0 records out
36864 bytes (37 kB) copied, 0.000510568 s, 72.2 MB/s

$ ls -ls fw*
16 -rw-rw-r-- 1 hopper hopper 36864 Mar 15 10:25 fwh
36 -rw-rw-r-- 1 hopper hopper 36864 Mar 15 10:29 fwnh

As you can see, the file with holes takes up fewer disk blocks, despite being the same size.

If you want a program that does it, here it is:

#include <unistd.h>
#include <sys/types.h>
#include <stdio.h>
#include <fcntl.h>

int main(int argc, const char *argv[])
{
    char random_garbage[8192]; /* Don't even bother to initialize */
    int fd = -1;
    if (argc < 2) {
        fprintf(stderr, "Usage: %s <filename>\n", argv[0]);
        return 1;
    }
    fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC, 0666);
    if (fd < 0) {
        perror("Can't open file: ");
        return 2;
    }
    write(fd, random_garbage, 8192);
    lseek(fd, 5 * 4096, SEEK_CUR);
    write(fd, random_garbage, 8192);
    close(fd);
    return 0;
}

The above should work on any Unix. Someone else replied with a nice alternative method that is very Linux specific. I highlight it here because it's a method distinct from the two I gave, and can be used to put holes in existing files.

Omnifarious
  • 54,333
  • 19
  • 131
  • 194
  • 3
    Somebody downvoted this, and I have no idea why. Perhaps the uninitialized memory left a bad taste since it's normally such a dumb thing to do. – Omnifarious Mar 23 '13 at 01:11
8
  1. Create a file.
  2. Seek to position N.
  3. Write some data.

There will be a hole at the start of the file (up to, and excluding, position N). You can similarly create files with holes in the middle.

The following document has some sample C code (search for "Sparse files"): http://www.win.tue.nl/~aeb/linux/lk/lk-6.html

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • Thanks for the link. It's helpful not just for this question. – Amumu Mar 15 '11 at 17:49
  • I once saw I way to avoid step 3 and to create 100% sparse files, but I can't recall how it is done :-( – Martin Scharrer Jul 21 '11 at 19:16
  • 1
    Found it: Step 3 can be replaced by `ftruncate(fileno(outfile), ftell(outfile));`, i.e. truncate the file to its current size after the `fseek` was done. This allows to have the end of the file sparse. – Martin Scharrer Jul 21 '11 at 20:07
8

Aside from creating files with holes, since ~2 months ago (mid-January 2011), you can punch holes on existing files on Linux, using fallocate(2) FALLOC_FL_PUNCH_HOLE LWN article, git commit on Linus' tree, patch to Linux's manpages.

ninjalj
  • 42,493
  • 9
  • 106
  • 148
3

The problem is carefully discussed in section 3.6 of W.Richard Stevens famous book "Advanced Programming in the UNIX Environment" (APUE for short). The lseek funstion included in unistd.h is used here, which is designed to set an open file's offset explicitly. The prototype of the lseek function is as follows:

off_t lseek(int filedes, off_t offset, int whence);

Here, filedes is the file descriptor, offset is the value we are willing to set, and whence is a constant set in the header file, specifically SEEK_SET, meaning that the offset is set from the beginning of the file; SEEK_CUR, meaning that the offset is set to its current value plus the offset in the arguement list; SEEK_END, meaning that the file's offset is set the the size of the file plus the offset in the arguement list.

The example to create a file with holes in C under UNIX like OSs is as follows:

/*Creating a file with a hole of size 810*/
#include <fcntl.h>

/*Two strings to write to the file*/    
char buf1[] = "abcde";
char buf2[] = "ABCDE";

int main()
{
    int fd; /*file descriptor*/

    if((fd = creat("file_with_hole", FILE_MODE)) < 0)
        err_sys("creat error");
    if(write(fd, buf1, 5) != 5)
        err_sys("buf1 write error");
    /*offset now 5*/

    if(lseek(fd, 815, SEEK_SET) == -1)
        err_sys("lseek error");
    /*offset now 815*/

    if(write(fd, buf2, 5) !=5)
        err_sys("buf2 write error");
    /*offset now 820*/

    return 0;
}

In the code above, err_sys is the function to deal with fatal error related to a system call.

clasnake
  • 309
  • 2
  • 9
0

A hole is created when data is written at an offset beyond the current file size or the file size is truncated to something larger than the current file size

syrus
  • 1
  • 1
    As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Nov 24 '21 at 08:21