14

Before anyone complains of "duplicate", I've been checking SO quite thoroughly, but there seem to be no clean answer yet, although the question looks quite simple.

I'm looking for a portable C code, which is able to provide the size of a file, even if such a file is bigger than 4GB.

The usual method (fseek, ftell) works fine, as long as the file remains < 2GB. It's fairly well supported everywhere, so I'm trying to find something equivalent.

Unfortunately, the updated methods (fseeko, ftello) are not supported by all compilers. For example, MinGW miss it (and obviously MSVC). Furthermore, some comments make me believe that the new return type (off_t) does not necessarily support size > 2GB, it may depend on some external parameters, to be checked.

The unambiguous methods (fseeko64, ftello64) are not supported by MSVC. MS provides their equivalent, _fseeki64 & _ftelli64. This is already bad, but it becomes worse : some Linux configurations seem to badly support these functions during run time. For example, my Debian Squeeze on PowerPC, using GCC 4.4, will produce a "filesize" method using fseeko64 which always return 0 (while it works fine for Ubuntu64). MinGW seems to answer some random garbage above 2GB.

Well, I'm a bit clueless as far as portability is concerned. And if I need to make a bunch of #if #else, then why not go straight to the OS & compilers specifics methods in the first place, such as GetFileSize() for MSVC for example.

Prof. Falken
  • 24,226
  • 19
  • 100
  • 173
Cyan
  • 13,248
  • 8
  • 43
  • 78
  • 3
    Well, what is your definition of "portable"? There are many systems that can't even open files. Even more that cannot open files over 4 GB in size. – Johan Kotlinski Jan 26 '12 at 23:21

6 Answers6

9

You said it: there's no portable method; if I were you I'd just go with GetFileSize on Windows and stat on POSIX.

Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
  • 2
    You could use `_stat64` on Windows to keep the code *sorta* the same. – user7116 Jan 26 '12 at 23:25
  • 1
    @sixlettervariables: correct, although I don't know if every compiler on Windows implement it (while `GetFileSize` is part of the Windows API, so it should always be available). – Matteo Italia Jan 26 '12 at 23:28
8

You should be able to use stat64 on Linux and _stat64 on Windows to get file size information for files over 2 GBs, and both functions are very similar in usage. You can also use a couple of #defines to use stat64 on Windows too:

#if __WIN32__
#define stat64 _stat64
#endif

However, although this should work, it should be noted that the _stat family of functions on Windows is really just a wrapper around other functions, and will add additonal resources and time overhead.

Frxstrem
  • 38,761
  • 9
  • 79
  • 119
6
int ch;
FILE *f = fopen("file_to_analyse", "rb");
/* error checking ommited for brevity */
unsigned long long filesize = 0; /* or unsigned long for C89 compatability*/
while ((ch = fgetc(f)) != EOF) filesize++;
fclose(f);
/* error checking ommited for brevity */
pmg
  • 106,608
  • 13
  • 126
  • 198
  • 3
    Ok, it's the only standard compliant way, but I hope you are being sarcastic: reading a whole file, possibly 2+ GB big, one character at time just to know its size (which on current filesystems is simply an attribute of the file) is plain stupid... – Matteo Italia Jan 26 '12 at 23:22
  • 2
    Oh, no, no, no... please tell me you're kidding. On the other hand, the question is about a portable way, not about an efficient one. This is a portable way indeed. – Daniel Kamil Kozar Jan 26 '12 at 23:22
  • 2
    It's event-driven, which is the reason why it's so fast. – Matt Joiner Jan 26 '12 at 23:35
  • Why is this so bad? How else would you count all the bytes, you would have to iterate over them and actually count them to find out, right? – Gerard Apr 25 '14 at 11:49
  • 3
    @Gerard Because the filesystem counts bytes *when they are written*, and then stores the value away. That's how it knows what EOF is. Reading massive files in their entirety to determine size is slow, reading a pre-calculated field stored in the filesystem is fast. – Unsigned Jun 10 '14 at 02:52
  • @Gerard Do you have to read every page of a book to find out how many pages it has? And if you did, would you ask why that's so bad? – Jim Balter Jul 02 '15 at 22:58
3

I have implemented and tested the following:

#if __WIN32__
#define stat64 _stat64
#endif

using MinGW64 gcc compiler 4.8.1 and Linux gcc 4.6.3 compiles and works.

On OSX, no redefinition of stat required.

for lstat and fstat functions I expect similar macro #defines to work.

user33327
  • 31
  • 1
1
#include sys/stat.h

off_t fsize(const char *filename) {
    struct stat st; 

    if (stat(filename, &st) == 0)
        return st.st_size;

    return -1; 
}
kichik
  • 33,220
  • 7
  • 94
  • 114
Roger
  • 2,823
  • 3
  • 25
  • 32
1

What about using lseek() (or _lseek()) with SEEK_END? It returns the offset sought to.

Under linux _FILE_OFFSET_BITS needs to be defined to 64 for lseek() to return 64bit values (which should be the default anyhow).

zwol
  • 135,547
  • 38
  • 252
  • 361
alk
  • 69,737
  • 10
  • 105
  • 255
  • I've not tried it yet. It seems lseek() might have the same sort of problem as fseeko() : the used type (off_t) may or may not support values above 2GB, depending on some external configuration. – Cyan Jan 27 '12 at 14:29
  • @Attract: I tested this under 32/64bit linux using `gcc` and under 32bit win-vista using `VC10`. – alk Jan 27 '12 at 14:35