2

I'm trying to access a file as a char array, via memory mapping it, or copying it into a buffer or whatever, but both of these need the size of the file, easy enough, thought I, just use fseek(file, 0, SEEK_END).

However: according to C++ Reference "Library implementations [of fseek] are allowed to not meaningfully support SEEK_END," Meaning that I can't get the size of a file using that method.

Next I tried fstat, which is less portable, but at least will provide a compile error rather than a runtime problem; but The Open Group notes that fstat does not need to provide a meaningful value for st_size.

So: has anyone actually come across a system where these methods do not work?

rici
  • 234,347
  • 28
  • 237
  • 341
Patrick Jeeves
  • 371
  • 2
  • 16
  • http://stackoverflow.com/questions/8236/how-do-you-determine-the-size-of-a-file-in-c – eddiem Nov 28 '16 at 20:57
  • 4
    Some files on Unix systems do not have meaningful file sizes because they aren't disk files. What is the size of a pipe? A socket? Any of the "virtual" files in `/proc/` and `/sys/`? That's why all the file-size functions have big warning footnotes on them - not all files have a meaningful size. – Colonel Thirty Two Nov 28 '16 at 20:59
  • @ColonelThirtyTwo Thank you, that makes sense. – Patrick Jeeves Nov 28 '16 at 21:24
  • @eddiem I have seen that, but it does not really answer the question I have, given that it uses the standard fseek and fstat solutions I mentioned. Furthermore, that just raises further questions in that fseek finds the end by looking for a '\0', but what sense does that make, if fread can correctly determine if it's at the end of a file? – Patrick Jeeves Nov 28 '16 at 21:28
  • You can `realloc` the buffer until `fread` returns `0` or you run out of memory (if file is not dynamic). – Weather Vane Nov 28 '16 at 21:45
  • @PatrickJeeves: Why do you think that fseek finds the end by looking for a NUL? On Unix-like systems, `fseek` does not read the file at all; it sets the position by using the result of an internal call to `stat` (although other implementations are possible). – rici Nov 28 '16 at 21:49
  • I agree that the provided question is not a duplicate of the question you actually ask in the body of your post, but it is clearly a duplicate of the question in the title. I took the liberty of editing the title and reopening the question. @ColonelThirtyTwo: I suggest you supply an answer based on your comment. – rici Nov 28 '16 at 21:54
  • Filesystems that support sparse files may not give the correct information the you need for those sparse files. – alvits Nov 28 '16 at 22:01
  • @alvits: `st_size` should be the offset of the last byte even if the file is sparse. The space occupied on disk is in the `st_blocks` field. – rici Nov 29 '16 at 02:54

1 Answers1

3

The notes about files not having valid sizes reported are there because, in Linux, there are many "files" for which "file size" is not a meaningful concept.

There are two main cases:

  • The file is not a regular file. In particular, pipes, sockets, and character device files are streams of data where data is consumed on read, and not put on disk, so a size does not make much sense.
  • The file system that the file resides on does not provide the file size. This is especially common in "virtual" filesystems, where the file contents are generated when read and, again, have no disk backing.

    To expand, filesystems do not necessarily keep file contents on disk. Since the filesystem API is a convenient API for expressing hierarchal data, and there are many tools for operating on files, it sometimes makes sense to expose data as a file hierarchy. For example, /proc/ contains information about processes (such as open files and used memory) and /sys/ contains driver-specific information and options (anything from sensor sampling rates to LED colors). With FUSE (Filesystem in UserSpacE), you can program a filesystem to do pretty much anything, from SSHing into a remote computer to exposing Twitter as a filesystem.

    For a lot of these filesystems, "file size" may not make much sense. For example, an LED driver might expose three files red, green, and blue. They can be read to get the current color or written to to change the color. Now, is it really worth implementing a file size for them, since they are merely settings in RAM, don't have any disk backing, and can't be removed? Not really.

In summary, files are not necessarily "things on disk". For many of the more advanced usages of files, "file size" either does not make sense or is not worth providing.

Colonel Thirty Two
  • 23,953
  • 8
  • 45
  • 85