5

In C, we can find the size of file using fseek() function. Like,

if (fseek(fp, 0L, SEEK_END) != 0)
{
    //  Handle repositioning error
}

So, I have a question, Is it recommended method for computing the size of a file using fseek() and ftell()?

msc
  • 33,420
  • 29
  • 119
  • 214
  • Related: http://stackoverflow.com/questions/8236/how-do-you-determine-the-size-of-a-file-in-c (read through the various answers). –  Sep 23 '16 at 12:56
  • 3
    Also a quote from the C Standard [ISO/IEC 9899:2011]: "Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state.". So the simple answer is no. –  Sep 23 '16 at 12:57
  • 2
    @Evert: The footnote you quoted is poorly worded. If footnotes were normative, it would immediately be flagged as a defect. The normative text of C language stanadard says "A binary stream need not meaningfully support `fseek` calls with a whence value of `SEEK_END`." I.e. there's a possibility of it being unsupported on some platforms. But it is not unconditionally deemed to be UB. If that were the case, `SEEK_END` would become entirely unusable: text streams don't support it at all, and binary streams would produce UB. – AnT stands with Russia Sep 23 '16 at 17:47
  • 1
    For a text file, the return value from `ftell` is not necessarily a byte offset, so it's not necessarily a portable way to determine the size of the text file. It's simply specified that you can pass it back to `fseek` (with `SEEK_SET`) to get back to that spot in the stream. – Adrian McCarthy Sep 23 '16 at 17:59
  • Sorry, you have to call `fseek(2)` twice, as the `fseek(2)` system call tells you where the pointer was before moving it. – Luis Colorado Sep 24 '16 at 21:17

4 Answers4

7

If you're on Linux or some other UNIX like system, what you want is the stat function:

struct stat statbuf;
int rval;

rval = stat(path_to_file, &statbuf);
if (rval == -1) {
    perror("stat failed");
} else {
    printf("file size = %lld\n", (long long)statbuf.st_size;
}

On Windows under MSVC, you can use _stati64:

struct _stati64 statbuf;
int rval;

rval = _stati64(path_to_file, &statbuf);
if (rval == -1) {
    perror("_stati64 failed");
} else {
    printf("file size = %lld\n", (long long)statbuf.st_size;
}

Unlike using fseek, this method doesn't involve opening the file or seeking through it. It just reads the file metadata.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • If data was written to the stream and has not been flushed to the file yet, the file size returned by `stat` will not include it. – chqrlie Sep 24 '16 at 21:29
5

The fseek()/ftell() works sometimes.

if (fseek(fp, 0L, SEEK_END) != 0) 
  printf("Size: %ld\n", ftell(fp));
}

Problems.

  1. If the file size exceeds about LONG_MAX, long int ftell(FILE *stream) response is problematic.

  2. If the file is opened in text mode, the return value from ftell() may not correspond to the file length. "For a text stream, its file position indicator contains unspecified information," C11dr §7.21.9.4 2

  3. If the file is opened in binary mode, fseek(fp, 0L, SEEK_END) is not well defined. "Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state." C11dr footnote 268. @Evert This most often applies to earlier platforms than today, but it is still part of the spec.

  4. If the file is a stream like a serial input or stdin, fseek(file, 0, SEEK_END) makes little sense.

The usual solution to finding file size is a non-portable platform specific one. Example good answer @dbush.

Note: If code attempts to allocate memory based on file size, the memory available can easily be exceeded by the file size.

Due to these issues, I do not recommend this approach.

Typically the problem should be re-worked to not need to find the file size, but to grow the data as more input is processed.


LL disclaimer: Note that C spec footnotes are informative and so not necessarily normative.

Community
  • 1
  • 1
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Sorry, you have to call `fseek(2)` twice, as the `fseek(2)` system call tells you where the pointer was before moving it. – Luis Colorado Sep 24 '16 at 21:16
  • @LuisColorado: unlike `lseek`, `fseek` does not return the file position. – chqrlie Sep 24 '16 at 21:26
  • oops... you're right. sorry. but remember that they are library functions that finally result in system calls. so `ftell` results in two syscalls `fseek(2)` finally, with most probability. – Luis Colorado Sep 24 '16 at 21:35
  • Sorry, read `lseek(2)` (the system call) where I wrote `fseek(2)` previously as some errata in writing. – Luis Colorado Sep 24 '16 at 21:41
0

The best method in my opinion is fstat(): https://linux.die.net/man/2/fstat

GMichael
  • 2,726
  • 1
  • 20
  • 30
0

Well, you can estimate the size of a file in several ways:

  • You can read(2) the file from the beginning to the end, and the number or chars read is the size of the file. This is a tedious way of getting the size of a file, as you have to read the whole file to get the size. But if the operating system doesn't allow to position the file pointer arbitrarily, then this is the only way to get the file size.
  • Or you can move the pointer at the end of file position. This is the lseek(2) you showed in the question, but be careful that you have to do the system call twice, as the value returned is the actual position before moving the pointer to the desired place.
  • Or you can use the stat(2) system call, that will tell you all the administrative information of the file, like the owner, group, permissions, size, number of blocks the file occupies in the disk, disk this file belongs to, number of directory entries pointing to it, etc. This allows you to get all this information with only one syscall.

Other methods you point (like the use of the ftell(3) stdio library call) will work also (with the same problem that it results in two system calls to set and retrieve/restore the file pointer) but have the problem of involving libraries that probably you are not using for anything else. It should be complicated to get a FILE * pointer (e.g. fdopen(3)) on a int file descriptor, just to be able to use the ftell(3) function on it (twice), and then fclose(3) it again.

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31