4

I need to determine the file size in bytes of binary regular files under POSIX. I'm aware of how to use this with lseek() and fstat():

#include <sys/stat.h> // for open() and fstat()
#include <fcntl.h>    // for O_RDONLY
#include <unistd.h>   // for lseek()

int fd = open("something.bin", O_RDONLY);
if (fd == -1)
{
    perror("Unable to open file to read");
    return EXIT_FAILURE;
}

// Using lseek()
const off_t size = lseek(fd, 0, SEEK_END);
if (size == (off_t) -1)
{
    perror("Unable to determine input file size");
    return EXIT_FAILURE;
}
// Don't forget to rewind
if (lseek(fd, 0, SEEK_SET) != 0)
{
    perror("Unable to seek to beginning of input file");
    return EXIT_FAILURE;
}
...

// Using fstat()
struct stat file_stat;
int rc = fstat(fd, &file_stat);
if (rc != 0 || S_ISREG(file_stat.st_mod) == 0)
{
    perror("fstat failed or file is not a regular file");
    return EXIT_FAILURE;
}
const off_t size = file_stat.st_size;

Why would I prefer one solution over the other?

Does one approach do more (and perhaps unnecessary) than the other?

Are there other POSIX compliant or standard C solutions that should be preferred?

Luke Peterson
  • 931
  • 1
  • 9
  • 25
  • 3
    Note `lseek()` does not insure a failure when "file is not a regular file". Seems like an important advantage of `fstat()`. – chux - Reinstate Monica Jan 17 '18 at 19:57
  • 2
    stat() and fstat() are fine. Why use anything else on POSIX systems? – Bjorn A. Jan 17 '18 at 20:06
  • [this](https://stackoverflow.com/questions/5957845/using-fseek-and-ftell-to-determine-the-size-of-a-file-has-a-vulnerability) seems to be your answer, but in my opinion if you want only see the size of file fstat is clearer(). – mariusz_latarnik01 Jan 17 '18 at 20:07

2 Answers2

2

Normally stat(), fstat() will read the metadata of the file to retrieve the file properties for the user. Mechanism to store metadata of files may vary from file system to file system but in general designed to give optimum speed/time complexity.

'file size' is one of the file properties stored in metadata and is updated at various file operations (e.g. write/append etc). Further fstat() doesn't require you to 'open()' the file.

On the other hand, Every 'open()' and 'lseek()' operations together could involve disk activity if the file is not present in the page cache of the operating system and could be exponentially more expensive.

Therefore I would recommend fstat().

Nitin
  • 145
  • 9
1

I recommend using stat(2) or fstat(2) to get the size of a regular file (and in my opinion, the definition of a file size is what stat tells in .st_size field).

Some regular files are not that regular. For example /proc/self/status or /proc/self/maps on a Linux system (read proc(5)), even if stat or ls tells that they are "regular files". See this.

On such /proc/ pseudofiles, there is no simple way to get their "real" size (because stat(2) is telling 0).

However, I believe that the file size is, almost by definition, what stat(2) tells you (and the fact that it is "lying" on /proc/ is IMHO a deficiency of the /proc/ file system; actually /proc/self/maps behave nearly like a pipe(7) read end, not like a regular file).

Think also of the weird cases where another process is changing a file (e.g. write(2)-ing or ftruncate(2)-ing it). Then doing several syscalls might not work very well.

At last, stat is the simplest (and often the fastest) way to get a file size. So why bother using something else?

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • Would it be useful in any way to try both ways? Apart from catastrophical failure (which I'd happily consider an outlier – just as your attempting to get the length of a file as it is being written ...), could `ftell` fail where `fstat` would succeed or the other way around? – Jongware Jan 17 '18 at 21:22
  • 1
    I don't think so, and most importantly, for me *by definition* the size of a file is what `stat` gives. I cannot find any better definition that this one. – Basile Starynkevitch Jan 17 '18 at 21:23