3

I'm dealing with the problem of APUE to write a program somehow like cp to copy files(Chapter 4 Problem 4.6). If the file contains holes(or sparse files) '\0's in the gaps shall never be coped. The ideal approach is to read and write block by block, whose size was determined by lseek(fd, current_off, SEEK_HOLE). I took /bin/ls as example. But evertime I lseek this file (or other files) the offset of file is always set to the end of file. I've checked this post but there seems to be no satisfactory answers. Here is my codes:

#include <stdio.h>
/* and other headers */

int main(void) {
    int fd;
    off_t off;
    fd = open("/bin/ls", O_RDONLY);
    if ((off = lseek(fd, 0, SEEK_HOLE) == -1)
        exit(-1);
    printf("%d\n", off);
    return 0;
}

My kernel is linux 3.13.0-rc3 pulled from latest stable tree and my fs is ext4. If lseek is unavailable, would it be proper to regard any '\0' as the beginning of a hole? Thanks for your answers.

Community
  • 1
  • 1
  • 3
    I am not sure why the question got downvoted. It seems pretty reasonable. BTW, there's some useful info at http://lwn.net/Articles/440778/ – NPE Dec 16 '13 at 07:26
  • 1
    Sparse files are rare in the wild; your source file is probably not sparse. – R.. GitHub STOP HELPING ICE Dec 16 '13 at 07:30
  • Why do you think `ls` is a sparse file? 100% it's a regular file. Sparse files are usually lengthy log files. – egur Dec 16 '13 at 08:48
  • @egur Log files? I don't think so. Better have a look for database files or such things. Or file systems in a regular file. Log files are usually written as they are. – glglgl Dec 16 '13 at 09:49
  • @glglgl - not any logs file. some applications may use this mechanism for very long log files. Sometimes also used for filesystem journals. – egur Dec 16 '13 at 09:53

1 Answers1

5

From 'man lseek' (man pages are your friend. First place to look for information.)

       SEEK_HOLE
          Adjust the file offset to the next hole in the file greater than
          or equal to offset.  If offset points into the middle of a hole,
          then the file offset is set to offset.  If there is no hole past
          offset,  then the file offset is adjusted to the end of the file
          (i.e., there is an implicit hole at the end of any file).

In other words, you're seeing entirely expected behavior. There's no hole in ls, so you're getting a hole at the end of the file.

You can create a sparse file for testing with dd:

dd if=/dev/zero of=sparsefile bs=1 count=1 seek=40G

As for your final question: No, that's not reasonable. It's entirely likely that files will have 0 bytes in them. This does not indicate that they're a sparse file.

Kristof Provost
  • 26,018
  • 2
  • 26
  • 28
  • thanks for all your comments, the mails are pretty enlightening. I've checked the implementation of gnu cp. They used the most simple approach to deal with it: iterate every byte in buffer they've read. If the buffer is zeroed out then mark this buffer not to be written to the target. I made a mistake that I use du -s to dump the size of the file and compared it to the file length from ls -l. du displays the result in KB therefore I thought du returns the size much smaller than files original length. In addition the '\0's should be in large quantity so that they could be recognized as a hole –  Dec 16 '13 at 09:44
  • 1
    @Hypeboyz Right, sparseness only applies if a whole block is zeroed out. – glglgl Dec 16 '13 at 09:51