461

How can I find out the size of a file I opened with an application written in C?

I would like to know the size because I want to put the content of the loaded file into a string, which I allocate using malloc().

Just writing malloc(10000*sizeof(char)); is IMHO a bad idea.

Matthias Braun
  • 32,039
  • 22
  • 142
  • 171
Nino
  • 5,261
  • 4
  • 22
  • 15
  • 46
    Note that sizeof(char) is 1, by definition. – Randy Proctor Oct 29 '09 at 13:44
  • 15
    Ya, but some esoteric platform's compiler might define char as 2 bytes - then the program allocates more than is necessary. One can never be too sure. – Nathan Osman Jan 05 '10 at 02:50
  • 38
    @George an "esoteric platform's compiler" where sizeof(char) != 1 is not a true C compiler. Even if a character is 32 bits, it will still return 1. – Andrew Flanagan Dec 06 '10 at 17:03
  • 26
    @George: The C (and C++) standard guarantees that `sizeof(char)==1`. See e.g.http://www.parashift.com/c++-faq-lite/intrinsic-types.html#faq-26.1 – sleske Feb 08 '11 at 13:40
  • 58
    I actually prefer `malloc(x*sizeof(char));` to `malloc(x);` when allocating x characters. Yes, they always compile to the same thing, but I like consistency with other memory allocations. – moltenform Apr 16 '11 at 01:16
  • 1
    I would hope the optimizer can figure this out and do the right thing, thus using sizeof is safer and equivalent – Ben Jul 12 '12 at 00:07
  • 6
    @Ben: writing more than you need is not safer, it can be more dangerous. More code presents a greater surface for bugs to infect. If you *really* want safer, then use `p = malloc(N * sizeof (*p))` - don't hardcode the type where the compiler can't check it for you. – Bernd Jendrissek Jan 19 '14 at 11:23
  • 1
    You can use `fstat` with `fileno` if you have `FILE*`: `fstat(fileno(f), &stat)` – sshilovsky Feb 15 '14 at 23:49
  • It's worth remembering that the C standard _redefines the word byte to mean a char_, so it's best to just avoid talking about bytes in a C context at all. (Try octets instead. AFAIK the standard hasn't changed those.) – David Given May 20 '17 at 20:47

8 Answers8

632

You need to seek to the end of the file and then ask for the position:

fseek(fp, 0L, SEEK_END);
sz = ftell(fp);

You can then seek back, e.g.:

fseek(fp, 0L, SEEK_SET);

or (if seeking to go to the beginning)

rewind(fp);
Rob Walker
  • 46,588
  • 15
  • 99
  • 136
  • 13
    @camh - Thanks man. This comment solved a problem I had with a file sizing algorithm. For the record, one opens a file in binary mode by putting a 'b' at the end of fopen's mode string. – T.E.D. May 18 '10 at 10:42
  • 5
    LOL, yeah right, Windows inherited this stupid text/binary mode nonsense from DOS. This is easily forgotten nowadays. Actually the POSIX standard even mandates that any POSIX system must be able to cope with the "b" flag in fopen calls (to be compatible with the C standard!), but on the same hand it mandates, that the implementation must ignore it entirely, since this flag has no effect on POSIX systems (those don't know any such thing as a text mode and always open in binary mode). – Mecki Sep 09 '11 at 17:46
  • 71
    Yo uh, use [`rewind`](http://www.cplusplus.com/reference/clibrary/cstdio/rewind/) before people forget what it means – bobobobo Sep 23 '11 at 16:55
  • 134
    Returns a signed int, so limited to 2 GB. But on the plus side your file could be negative 2 billion bytes long, and they are prepared for that. – Seth Feb 13 '12 at 21:07
  • 27
    `length = lseek(fd, 0, SEEK_END)+1;` – Volodymyr M. Lisivka Nov 16 '12 at 16:24
  • 30
    From [fseek documentation](http://www.cplusplus.com/reference/cstdio/fseek/) "Library implementations are allowed to not meaningfully support SEEK_END (therefore, code using it has no real standard portability)." – Mika Haarahiltunen Sep 02 '13 at 10:43
  • 4
    >2GB prob could be avoided using fseeko and ftello. If possible edit the answer.!! – Sandeep Apr 11 '14 at 02:57
  • 1
    @MikaHaarahiltunen [At least if you are working on a POSIX](http://pubs.opengroup.org/onlinepubs/009695399/functions/fseek.html) system, that is definitely not the case. (And I wouldn't trust cplusplus.com at all) – idmean May 11 '15 at 17:04
  • 1
    fseek returns the file pointer offset, so you don't need to use ftell. Just say "sz = fseek(fp, 0L, SEEK_END);". – micheal65536 Dec 15 '15 at 20:41
  • 10
    THIS IS NOT PORTABLE. DON'T USE THIS. IT'S NOT POSIX COMPLIANT – Ryan Sep 12 '16 at 02:59
  • 2
    @RobWalker : https://www.securecoding.cert.org/confluence/display/c/FIO19-C.+Do+not+use+fseek()+and+ftell()+to+compute+the+size+of+a+regular+file – user2284570 Nov 01 '16 at 23:16
  • @Mecki because Windows was first a DOS extension instead of a real operating system – phuclv Nov 26 '16 at 03:27
  • 10
    Note `fseek(fp, 0L, SEEK_END);` on a binary stream is not strictly-conforming, portable C code. Per [footnote 268 of the C standard](https://port70.net/~nsz/c/c11/n1570.html#note268): *"Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream..."*, and [`ftell()` on a text stream won't work](https://port70.net/~nsz/c/c11/n1570.html#7.21.9.4p2): *"For a text stream, its file position indicator contains unspecified information ... not necessarily a meaningful measure of the number of characters written or read."* – Andrew Henle May 04 '18 at 19:28
  • 1
    https://wiki.sei.cmu.edu/confluence/display/c/FIO19-C.+Do+not+use+fseek%28%29+and+ftell%28%29+to+compute+the+size+of+a+regular+file – SetupX Jun 18 '18 at 03:06
  • @MichealJohnson did you mean `lseek()` instead of `fseek()`? The man page I'm referencing says, "Upon successful completion, `fgetpos()`, `fseek()`, `fsetpos()` return 0". – tomlogic Apr 14 '20 at 03:56
  • @tomlogic You would appear to be correct. I'm assuming I got the two confused as the answer was using `fseek`. Indeed the manual page for `fseek` says that it will return 0, while `lseek` will return the offset. Also, referring back to the code where I have used this technique myself to determine the size of a file, I have indeed used `lseek` and not `fseek`. – micheal65536 May 03 '20 at 11:47
  • @tomlogic Apart from the return value, the other main difference between these functions seems to be that one takes a `FILE*` file handle while the other takes an `int` file handle. This doesn't tend to matter on Linux (as long as you're consistent with which functions you use e.g. `fopen` vs `open`) but I've had issues with porting software to other platforms where the `FILE*` functions are supported but the `int` ones are not. I don't know the details behind this but I'm guessing one is a C standard and the other is a Linux (or POSIX? but I thought POSIX was supported on Windows) extension. – micheal65536 May 03 '20 at 11:53
  • @MichealJohnson: yes, `fopen()` is in the Standard C Library and `open` came from POSIX. https://stackoverflow.com/a/1658517/266392 does a good job of discussing the differences. – tomlogic May 04 '20 at 22:59
  • @VolodymyrM.Lisivka Why do you add one to the value returned by `lseek`? I have tested it without adding one, and it still equal to the output of `stat`. Yeah, bowelling outdated comments is my hobby :) – mathway Jun 14 '21 at 14:04
  • 1
    This answer is absolutely incorrect , and will silently break on esoteric platforms. See https://wiki.sei.cmu.edu/confluence/display/c/FIO19-C.+Do+not+use+fseek%28%29+and+ftell%28%29+to+compute+the+size+of+a+regular+file – user426 Sep 14 '21 at 05:51
  • @Seth Re: "Returns a signed int, so limited to 2 GB": [Why does fseek have "long int offset" instead of "long long int offset"?](https://stackoverflow.com/q/71020745/1778275). – pmor Nov 11 '22 at 14:13
  • For anyone seeing it here, you don't need ftell, lseek returns current position with one less syscall – Shahaboddin Aug 19 '23 at 13:17
451

Using standard library:

Assuming that your implementation meaningfully supports SEEK_END:

fseek(f, 0, SEEK_END); // seek to end of file
size = ftell(f); // get current file pointer
fseek(f, 0, SEEK_SET); // seek back to beginning of file
// proceed with allocating memory and reading the file

Linux/POSIX:

You can use stat (if you know the filename), or fstat (if you have the file descriptor).

Here is an example for stat:

#include <sys/stat.h>
struct stat st;
stat(filename, &st);
size = st.st_size;

Win32:

You can use GetFileSize or GetFileSizeEx.

cubuspl42
  • 7,833
  • 4
  • 41
  • 65
Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
130

If you have the file descriptor fstat() returns a stat structure which contain the file size.

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>

// fd = fileno(f); //if you have a stream (e.g. from fopen), not a file descriptor.
struct stat buf;
fstat(fd, &buf);
off_t size = buf.st_size;
Community
  • 1
  • 1
PiedPiper
  • 5,735
  • 1
  • 30
  • 40
  • 3
    Add "fd = fileno(f);" if you have a stream (e.g. from fopen), not a file descriptor. Needs error checking. – ysth Oct 26 '08 at 21:24
  • 18
    Of course it needs error checking - that would just complicate the example. – PiedPiper Oct 26 '08 at 21:28
  • 6
    this is in my opinion the best real answer, and i think we all have our training wheels off for the most part in C, do we really need error checking and other unnecessary code in our examples, its bad enough M$DN does it in theirs, lets not follow suit, instead just say at the end 'make sure to add error checking' and be done with it. – osirisgothra Nov 07 '13 at 16:45
  • 1
    If you call this with fileno(), it may be inaccurate due to file caching. I'm not aware of a method to get a FILE's length without causing the buffer to flush. – kainjow May 02 '14 at 15:21
  • 18
    a LOT of the users of SO are students of C, not past masters. Therefore, the code given in the answers should show the error checking, so the student learns the right way to code. – user3629249 Feb 23 '15 at 18:30
  • 5
    there is the detail that (f)stat() returns the block allocation total bytes while fseek() / ftell() sequence returns the number of bytes before EOF is encountered. – user3629249 Feb 23 '15 at 18:32
  • @user3629249: stat gives you both numbers. `st_size` is the real length, with byte granularity. `st_blocks` is the number of 512-byte disk blocks used by the file (including extra blocks for metadata, attributes, and even block-lists or extent-lists for large files where the list of blocks or extents doesn't fit in the inode itself.) Whether the FS actually allocates in 512B blocks or not, that's the unit stat uses. (https://man7.org/linux/man-pages/man2/lstat.2.html). For most filesystems, `st_size` is accurate, [but not on Linux `/proc` and `/sys`](https://stackoverflow.com/q/55826796) – Peter Cordes Nov 25 '21 at 11:25
27

I ended up just making a short and sweet fsize function(note, no error checking)

int fsize(FILE *fp){
    int prev=ftell(fp);
    fseek(fp, 0L, SEEK_END);
    int sz=ftell(fp);
    fseek(fp,prev,SEEK_SET); //go back to where we were
    return sz;
}

It's kind of silly that the standard C library doesn't have such a function, but I can see why it'd be difficult as not every "file" has a size(for instance /dev/null)

Earlz
  • 62,085
  • 98
  • 303
  • 499
18

How to use lseek/fseek/stat/fstat to get filesize ?

#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>

void
fseek_filesize(const char *filename)
{
    FILE *fp = NULL;
    long off;

    fp = fopen(filename, "r");
    if (fp == NULL)
    {
        printf("failed to fopen %s\n", filename);
        exit(EXIT_FAILURE);
    }

    if (fseek(fp, 0, SEEK_END) == -1)
    {
        printf("failed to fseek %s\n", filename);
        exit(EXIT_FAILURE);
    }

    off = ftell(fp);
    if (off == -1)
    {
        printf("failed to ftell %s\n", filename);
        exit(EXIT_FAILURE);
    }

    printf("[*] fseek_filesize - file: %s, size: %ld\n", filename, off);

    if (fclose(fp) != 0)
    {
        printf("failed to fclose %s\n", filename);
        exit(EXIT_FAILURE);
    }
}

void
fstat_filesize(const char *filename)
{
    int fd;
    struct stat statbuf;

    fd = open(filename, O_RDONLY, S_IRUSR | S_IRGRP);
    if (fd == -1)
    {
        printf("failed to open %s\n", filename);
        exit(EXIT_FAILURE);
    }

    if (fstat(fd, &statbuf) == -1)
    {
        printf("failed to fstat %s\n", filename);
        exit(EXIT_FAILURE);
    }

    printf("[*] fstat_filesize - file: %s, size: %lld\n", filename, statbuf.st_size);

    if (close(fd) == -1)
    {
        printf("failed to fclose %s\n", filename);
        exit(EXIT_FAILURE);
    }
}

void
stat_filesize(const char *filename)
{
    struct stat statbuf;

    if (stat(filename, &statbuf) == -1)
    {
        printf("failed to stat %s\n", filename);
        exit(EXIT_FAILURE);
    }

    printf("[*] stat_filesize - file: %s, size: %lld\n", filename, statbuf.st_size);

}

void
seek_filesize(const char *filename)
{
    int fd;
    off_t off;

    if (filename == NULL)
    {
        printf("invalid filename\n");
        exit(EXIT_FAILURE);
    }

    fd = open(filename, O_RDONLY, S_IRUSR | S_IRGRP);
    if (fd == -1)
    {
        printf("failed to open %s\n", filename);
        exit(EXIT_FAILURE);
    }

    off = lseek(fd, 0, SEEK_END);
    if (off == -1)
    {
        printf("failed to lseek %s\n", filename);
        exit(EXIT_FAILURE);
    }

    printf("[*] seek_filesize - file: %s, size: %lld\n", filename, (long long) off);

    if (close(fd) == -1)
    {
        printf("failed to close %s\n", filename);
        exit(EXIT_FAILURE);
    }
}

int
main(int argc, const char *argv[])
{
    int i;

    if (argc < 2)
    {
        printf("%s <file1> <file2>...\n", argv[0]);
        exit(0);
    }

    for(i = 1; i < argc; i++)
    {
        seek_filesize(argv[i]);
        stat_filesize(argv[i]);
        fstat_filesize(argv[i]);
        fseek_filesize(argv[i]);
    }

    return 0;
}
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
lezard
  • 1
  • 1
  • 2
  • 1
    or `if(off == (-1L))` no need for `(long)` – Imobilis Jun 01 '18 at 00:45
  • `ftell` returns a `long`, unfortunately. You need `ftello` to return an `off_t`. (Or apparently on Windows, [`_ftelli64()`](https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/ftell-ftelli64?view=msvc-170), because it seems they love to make it harder to write portable code.) See [discussion on another answer](https://stackoverflow.com/questions/8236/how-do-you-determine-the-size-of-a-file-in-c/37661250#comment123935146_37661250) – Peter Cordes Nov 25 '21 at 11:33
  • 1
    `fstat` only makes sense if you already have an open file, or as part of the process of opening it. Your `fstat_filesize` isn't something you'd ever want to use in that form, only if you were going to actually keep that `fd` around and read from it or something. open/`fstat`/close has zero advantage over `stat`; I'd have written that function to take a `FILE *fp` (use `fileno()`) or `int fd`. I guess your functions aren't intended to be used as-is because they only printf the results instead of returning them, though. – Peter Cordes Nov 25 '21 at 11:38
  • 1
    Also, since you're not passing `O_CREAT` to `open`, the 3rd arg is unused. `S_IRUSR | S_IRGRP` is not meaningful there. If `open` *was* going to create the file, that would give it `0440` aka `r--r-----` permissions (which would stop anything else from opening and writing to it), but it won't without `O_CREAT` so the `int open(const char *pathname, int flags);` form of the prototype applies. https://man7.org/linux/man-pages/man2/open.2.html – Peter Cordes Nov 25 '21 at 11:42
  • Other than the design of `fstat_filesize`, yeah this is a useful example of how to do error checking. Except you should `fprintf(stderr, ...` with your error messages. And in the functions using POSIX `stat` and friends, you should be using `strerror` as part of that to get an actual reason for the failure, like "no such file or directory" for `ENOENT` or "Permission Denied" for `EPERM`. That's much more useful and the standard way to report errors in Unix programs. (System call and file name is better than nothing, the user might not be thinking of permissions if you don't tell them.) – Peter Cordes Nov 25 '21 at 12:02
9

Have you considered not computing the file size and just growing the array if necessary? Here's an example (with error checking ommitted):

#define CHUNK 1024

/* Read the contents of a file into a buffer.  Return the size of the file 
 * and set buf to point to a buffer allocated with malloc that contains  
 * the file contents.
 */
int read_file(FILE *fp, char **buf) 
{
  int n, np;
  char *b, *b2;

  n = CHUNK;
  np = n;
  b = malloc(sizeof(char)*n);
  while ((r = fread(b, sizeof(char), CHUNK, fp)) > 0) {
    n += r;
    if (np - n < CHUNK) { 
      np *= 2;                      // buffer is too small, the next read could overflow!
      b2 = malloc(np*sizeof(char));
      memcpy(b2, b, n * sizeof(char));
      free(b);
      b = b2;
    }
  }
  *buf = b;
  return n;
}

This has the advantage of working even for streams in which it is impossible to get the file size (like stdin).

Pat Morin
  • 99
  • 1
  • 2
  • 19
    Maybe the `realloc` function could be used here instead of using an intermediate pointer and having to `free()`. – Victor Zamanian Mar 13 '11 at 00:53
  • This has the very real disadvantage of being O(n^2) ... the size of the thing you have to copy grows. OK for small files, TERRIBLE for big ones. If you have a 1k chunk and a 100M file, you end up copying (if I did my math right) roughly 1E17 bytes. That may be a pathological example, but it demonstrates why you should not do this. – Floris Jan 27 '16 at 21:19
  • 3
    Unless I am misreading, the size being stored into doubles each time. The run-time is therefore O(n) rather than O(n^2). This is the same allocation strategy that is typically used for std::vector and its ilk. Regardless, reallocations are still less efficient than querying the file size and reading all at once. – Joe Apr 19 '16 at 02:48
  • This *is* doubling on each reallocation. Any constant factor resize greater than one is sufficient to get the O(n) bound, literal doubling is maybe overkill, to scale by 1.75 e.g. use `np += (np / 2) + (np / 4);` - all integer, intermediate results don't overflow "early". I'd more likely use 1.5, but 1.75 shows the idea better. Of course watch out for overflow, and particularly any multiple of the previous size may overflow when the actual size doesn't. If your file size is `(2^31)-1`, this will probably attempt to allocate a buffer with `-(2^31)` rather than `2^31` bytes. –  Nov 13 '16 at 17:19
  • I should probably warn that `np += (np / 2) + (np / 4)` doesn't give an exact multiply by 1.75 - results can be too small because no carry propagates from bits that were truncated away - but it should be good enough for this purpose. For multiplying by 1.5, `np += (np / 2);` should be correct. –  Nov 13 '16 at 17:50
8

If you're on Linux, seriously consider just using the g_file_get_contents function from glib. It handles all the code for loading a file, allocating memory, and handling errors.

GabrielF
  • 2,071
  • 1
  • 17
  • 29
Ben Combee
  • 16,831
  • 6
  • 41
  • 42
-42
#include <stdio.h>

#define MAXNUMBER 1024

int main()
{
    int i;
    char a[MAXNUMBER];

    FILE *fp = popen("du -b  /bin/bash", "r");

    while((a[i++] = getc(fp))!= 9)
        ;

    a[i] ='\0';

    printf(" a is %s\n", a);

    pclose(fp);
    return 0;
}  

HTH

plan9assembler
  • 2,862
  • 1
  • 24
  • 13