30

I'm trying to get the filesize of a large file (12gb+) and I don't want to open the file to do so as I assume this would eat a lot of resources. Is there any good API to do so with? I'm in a Windows environment.

Ajay
  • 18,086
  • 12
  • 59
  • 105
user1167566
  • 313
  • 1
  • 3
  • 5

5 Answers5

53

You should call GetFileSizeEx which is easier to use than the older GetFileSize. You will need to open the file by calling CreateFile but that's a cheap operation. Your assumption that opening a file is expensive, even a 12GB file, is false.

You could use the following function to get the job done:

__int64 FileSize(const wchar_t* name)
{
    HANDLE hFile = CreateFile(name, GENERIC_READ, 
        FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, 
        FILE_ATTRIBUTE_NORMAL, NULL);
    if (hFile==INVALID_HANDLE_VALUE)
        return -1; // error condition, could call GetLastError to find out more

    LARGE_INTEGER size;
    if (!GetFileSizeEx(hFile, &size))
    {
        CloseHandle(hFile);
        return -1; // error condition, could call GetLastError to find out more
    }

    CloseHandle(hFile);
    return size.QuadPart;
}

There are other API calls that will return you the file size without forcing you to create a file handle, notably GetFileAttributesEx. However, it's perfectly plausible that this function will just open the file behind the scenes.

__int64 FileSize(const wchar_t* name)
{
    WIN32_FILE_ATTRIBUTE_DATA fad;
    if (!GetFileAttributesEx(name, GetFileExInfoStandard, &fad))
        return -1; // error condition, could call GetLastError to find out more
    LARGE_INTEGER size;
    size.HighPart = fad.nFileSizeHigh;
    size.LowPart = fad.nFileSizeLow;
    return size.QuadPart;
}

If you are compiling with Visual Studio and want to avoid calling Win32 APIs then you can use _wstat64.

Here is a _wstat64 based version of the function:

__int64 FileSize(const wchar_t* name)
{
    __stat64 buf;
    if (_wstat64(name, &buf) != 0)
        return -1; // error, could use errno to find out more

    return buf.st_size;
} 

If performance ever became an issue for you then you should time the various options on all the platforms that you target in order to reach a decision. Don't assume that the APIs that don't require you to call CreateFile will be faster. They might be but you won't know until you have timed it.

Emily L.
  • 5,673
  • 2
  • 40
  • 60
David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • 1
    Of course, [`CreateFile()` can be rather slow if you're opening the file on slow media](http://blogs.msdn.com/b/larryosterman/archive/2004/05/24/140396.aspx) like network drives, but the slowness would be due to storage access latencies and not because of the fact that the file is huge. – In silico Jan 24 '12 at 17:46
  • @Insilico Or tape drives! But I believe opening the file is the only way to find the file size, at least on windows. – David Heffernan Jan 24 '12 at 17:49
  • @DavidHeffernan: No! The file size is in the header and thus in the directory. The FindFirstFile() as shown below will read that information without having to open the file. – Alexis Wilke Apr 21 '13 at 02:35
  • 3
    @Alexis Read Raymond's article to learn the details. The metadata contains a copy of the size but it can be out of date. The true size is in the file. http://blogs.msdn.com/b/oldnewthing/archive/2011/12/26/10251026.aspx – David Heffernan Apr 21 '13 at 07:31
  • 1
    Floppy drives and damaged CDs are also slow media. Moreover, you may be enumerating thousands of not-massive files and having to open and close each one to get the size is cumbersome, especially since the size is already stored in the directory entry which could/should be cached in memory; another reason that FAT(32) and CDFS are still good. – Synetech Jun 20 '13 at 02:29
  • @Synetech There may be perf reasons on different file systems, but certainly on NTFS then the file size in the dir entry may not be accurate. – David Heffernan Jun 20 '13 at 06:09
  • Yes, that’s why I said FAT is still good (I know a lot of people have moved to NTFS, but this is just another reason that I like to use FAT32 for everything, other than the Windows drive which now requires NTFS). – Synetech Jun 20 '13 at 18:55
  • Everyone claiming that opening a file is a cheap operation should test this statement with 10'000 or 100'000 files and enjoy the result. – Anton Samsonov Aug 22 '14 at 16:07
  • @Anton The question asks about one file and the asker thinks that opening large files is more expensive than opening small files. – David Heffernan Aug 22 '14 at 16:14
  • 1
    Please take `std::wstring` arguments by const reference... you're doing memory copies on each call :S – Emily L. Oct 03 '16 at 14:16
  • @EmilyL. Apparently there is debate over that issue: http://stackoverflow.com/questions/10231349/are-the-days-of-passing-const-stdstring-as-a-parameter-over In any case, I don't think it's worth getting too exercised at this question, it being really about winapi. Thanks! – David Heffernan Oct 03 '16 at 14:26
  • @DavidHeffernan Winapi or not, I'd be happy if you showed good practice to new (or copy paste) C++ programmers who see your example code and are likely to copy it verbatim... Regarding that debate, this is not one of the cases where it is advisable to pass by value. If anything you should pass by `const wchar_t*` as all you really want is to call `.c_str()` anyway let the user decide where and if they want a memcpy. – Emily L. Oct 03 '16 at 14:49
  • @Emily OK. I'm really not an expert on C++ and am somewhat busy right now. Perhaps you could edit. – David Heffernan Oct 03 '16 at 15:09
  • So File Explorer has to open every file it displays the size of? Even as you scroll through thousands of file names? – Kyle Delaney Mar 15 '18 at 17:30
  • Does `GetCompressedFileSize` have to open the file too, even though that takes a file name and not a file handle? – Kyle Delaney Mar 15 '18 at 17:33
38

I've also lived with the fear of the price paid for opening a file and closing it just to get its size. And decided to ask the performance counter^ and see how expensive the operations really are.

This is the number of cycles it took to execute 1 file size query on the same file with the three methods. Tested on 2 files: 150 MB and 1.5 GB. Got +/- 10% fluctuations so they don't seem to be affected by actual file size. (obviously this depend on CPU but it gives you a good vantage point)

  • 190 cycles - CreateFile, GetFileSizeEx, CloseHandle
  • 40 cycles - GetFileAttributesEx
  • 150 cycles - FindFirstFile, FindClose

The GIST with the code used^ is available here.

As we can see from this highly scientific :) test, slowest is actually the file opener. 2nd slowest is the file finder while the winner is the attributes reader. Now, in terms of reliability, CreateFile should be preferred over the other 2. But I still don't like the concept of opening a file just to read its size... Unless I'm doing size critical stuff, I'll go for the Attributes.

PS: When I'll have time I'll try to read sizes of files that are opened and am writing to. But not right now...

CodeAngry
  • 12,760
  • 3
  • 50
  • 57
  • 2
    With regard to your **P.S.**: It appears that GetFileAttributesEx() does in fact return the correct file size while the file is still being updated by another process, making it the fastest (correct file size) choice. If it only had the last file changed time (not to be confused with the last write time), as well, this function would be perfect! – Michael Goldshteyn Dec 05 '19 at 20:41
  • 1
    @MichaelGoldshteyn What exactly is the last file changed time you mentioned in the above comment? Is there another API to get this time? – Gautam Jain May 02 '20 at 11:39
  • This is great to see some figures, but I suspect the real question is how much IO does each involved. It's not clear whether they are different in that respect. – O'Rooney Jun 02 '20 at 22:01
12

Another option using the FindFirstFile function

#include "stdafx.h"
#include <windows.h>
#include <tchar.h>
#include <stdio.h>

int _tmain(int argc, _TCHAR* argv[])
{
   WIN32_FIND_DATA FindFileData;
   HANDLE hFind;
   LPCTSTR  lpFileName = L"C:\\Foo\\Bar.ext";

   hFind = FindFirstFile(lpFileName , &FindFileData);
   if (hFind == INVALID_HANDLE_VALUE) 
   {
      printf ("File not found (%d)\n", GetLastError());
      return -1;
   } 
   else 
   {
      ULONGLONG FileSize = FindFileData.nFileSizeHigh;
      FileSize <<= sizeof( FindFileData.nFileSizeHigh ) * 8; 
      FileSize |= FindFileData.nFileSizeLow;
      _tprintf (TEXT("file size is %u\n"), FileSize);
      FindClose(hFind);
   }
   return 0;

}
RRUZ
  • 134,889
  • 20
  • 356
  • 483
  • 2
    Use `ULARGE_INTEGER` instead of twiddling the `ULONGLONG` bits manually, eg: `ULARGE_INTEGER ul; ul.LowPart = FindFileData.nFileSizeLow; ul.HighPart = FindFileData.nFileSizeHigh; ULONGLONG FileSize = ul.QuadPart;`. Also, `%u` expects a 32-bit `unsigned int` on Windows, you need to use `%Lu` instead for a 64-bit integer. – Remy Lebeau Jan 24 '12 at 23:25
  • 3
    I believe FindFirstFile retrieves the file size as recorded in the directory entry. Note that under some circumstances this may not be accurate, e.g., if the file is hard linked and was modified via a different hard link, or if another application has the file open and has modified it. See http://blogs.msdn.com/b/oldnewthing/archive/2011/12/26/10251026.aspx – Harry Johnston Jan 25 '12 at 02:35
  • 1
    Presumably the issue that Harry points to is why the Delphi RTL stopped using FindFirstFile in its file size sys function. – David Heffernan Jan 25 '12 at 07:50
  • This method doesn't work for symbolic link, it returns zero. – Changming Sun Jun 14 '16 at 02:20
5

As of C++17, there is file_size as part of the standard library. (Then the implementor gets to decide how to do it efficiently!)

Davis Herring
  • 36,443
  • 4
  • 48
  • 76
0

What about GetFileSize function?

Armen Tsirunyan
  • 130,161
  • 59
  • 324
  • 434
  • 3
    That requires opening the file, which the OP said is not desirable. – Remy Lebeau Jan 24 '12 at 23:26
  • 1
    @remy but the file is where the size is stored so the two requests in the question are contradictory – David Heffernan Jan 25 '12 at 07:40
  • Actually no, the file itself does not store the size. The filesystem stores it. `GetFileSize()` requires the file to be opened first, then it uses that handle to determine where the file is located in the filesystem so it can grab the size. If you use `FindFirstFile()` instead, it queries the filesystem without needing to open the file. – Remy Lebeau Jan 25 '12 at 18:28
  • 3
    @Remy Not according to Raymond: http://blogs.msdn.com/b/oldnewthing/archive/2011/12/26/10251026.aspx Also, if you don't use name then there won't be a notification so you just end up talking to yourself! – David Heffernan Jan 25 '12 at 22:35