14

Is it possible to get the size of a file in C# without using System.IO.FileInfo at all?

I know that you can get other things like Name and Extension by using Path.GetFileName(yourFilePath) and Path.GetExtension(yourFilePath) respectively, but apparently not file size? Is there another way I can get file size without using System.IO.FileInfo?

The only reason for this is that, if I'm correct, FileInfo grabs more info than I really need, therefore it takes longer to gather all those FileInfo's if the only thing I need is the size of the file. Is there a faster way?

Blachshma
  • 17,097
  • 4
  • 58
  • 72
sergeidave
  • 662
  • 4
  • 11
  • 23
  • 22
    Premature optimization is the root of all evil. Use `FileInfo`, profile the code, and determine if it is fast enough for your needs. If you have verified that it is both a substantial percentage of the runtime of your application, and that your application is unacceptably slow, then consider other options. – Servy Jan 18 '13 at 21:31
  • 1
    I would imagine it's the file size taking the bulk of the time, with the other items coming along for the ride basically for free. – asawyer Jan 18 '13 at 21:32
  • 2
    Premature optimization is the root of all evil. Is this really causing an issue for you? – Kevin Jan 18 '13 at 21:32
  • 1
    @asawyer And that's assuming the information isn't lazily loaded to begin with. – Servy Jan 18 '13 at 21:32
  • @Servy Yep. Profile profile profile. – asawyer Jan 18 '13 at 21:33
  • I have a small application that gathers the size info and saves it into an array... but I often have half a million files, give or take and that takes a while to go through all of those files (I'm using FileInfo). I was just wondering if there was a faster way... – sergeidave Jan 18 '13 at 21:35
  • @sergeidave So how long does it take to run? How long does it need to run in for you to meet your requirements? – Servy Jan 18 '13 at 21:36
  • 4
    A well-known problem with FileInfo is that it only obtains the data that you ask for. But pretty convenient right now and the reason that trying to optimize it is pointless. – Hans Passant Jan 18 '13 at 21:37
  • 1
    @Servy Requirements can't provide you with possibility. I know what your beating at but the OP is looking to determine BAU, what should they expect. If the OP knows that `FileInfo` is generally 15% overhead without optimization X, I believe that is what they are after. – Aaron McIver Jan 18 '13 at 21:39
  • @AaronMcIver If you know that not doing optimization X is 15% slower, but your application spends .001% of it's time doing that task, then there is no compelling reason to use that optimization. However, that is the reason I have just posted comments, and not an answer saying that he should just use `FileInfo`, because it is not an answer to the question, just the likely course of action the OP should take anyway. – Servy Jan 18 '13 at 21:43
  • 1
    `System.IO.FileInfo` uses Win32's `FindFirstFile` API call to extract a `WIN32_FIND_FILE` structure. You could use `GetFileSizeEx` but it requires a `HANDLE` which you must obtain from opening the file first. I would assume the former is better on performance. If you _really_ need insane performance, then try the Win32 calls to `FindFirstFile` (and `FindClose`) yourself. – Erik Jan 18 '13 at 21:48
  • @Servy, I have a meeting shortly but I will run some numbers and get back with specific results. Thank you! – sergeidave Jan 18 '13 at 21:49

6 Answers6

10

I performed a benchmark using these two methods:

    public static uint GetFileSizeA(string filename)
    {
        WIN32_FIND_DATA findData;
        FindFirstFile(filename, out findData);
        return findData.nFileSizeLow;
    }

    public static uint GetFileSizeB(string filename)
    {
        IntPtr handle = CreateFile(
            filename,
            FileAccess.Read,
            FileShare.Read,
            IntPtr.Zero,
            FileMode.Open,
            FileAttributes.ReadOnly,
            IntPtr.Zero);
        long fileSize;
        GetFileSizeEx(handle, out fileSize);
        CloseHandle(handle);
        return (uint) fileSize;
    }

Running against a bit over 2300 files, GetFileSizeA took 62-63ms to run. GetFileSizeB took over 18 seconds.

Unless someone sees something I'm doing wrong, I think the answer is clear as to which method is faster.

Is there a way I can refrain from actually opening the file?

Update

Changing FileAttributes.ReadOnly to FileAttributes.Normal reduced the timing so that the two methods were identical in performance.

Furthermore, if you skip the CloseHandle() call, the GetFileSizeEx method becomes about 20-30% faster, though I don't know that I'd recommend that.

Pete
  • 6,585
  • 5
  • 43
  • 69
  • It can be further improved by using FindFirstFileEx and limit the search if possible. – SmartK8 Jan 18 '13 at 23:55
  • @SmartK8 How do you mean limit it? It's searching for a specific filename. In the benchmarking, I got a list of all the files in a directory and then called GetFileSizeA() or GetFileSizeB() with the full path and filename. – Pete Jan 19 '13 at 00:52
  • I assume that OP indicated by "it takes longer to gather all those" that he's grabbing more files. He didn't said he goes for only one file in one directory necessarily. So I'm pointing out, that I agree he should use FindFirstFile (FindNextFile) in that case. Or possibly FindFirstFileEx. As it provides more options to specify the search options (only folders, large fetch, etc.) – SmartK8 Jan 19 '13 at 11:28
  • @Pete: I'm getting some reference errors when trying to test the methods you suggested. Do I have to call any particular libraries for these? Thanks. – sergeidave Jan 28 '13 at 16:34
  • @sergeidave Oops. Fixed the code. Change handle to an IntPtr instead. There's extra cost in using SafeHandle because it needs to be released. – Pete Jan 28 '13 at 19:02
  • @Pete : Could you please tell me where I can find the complete functions for 'FindFirstFile()', 'CreateFile()', 'GetFileSizeEx()', 'CloseHandle()'. I want to use your code in C#. – Koder101 Jan 11 '17 at 15:48
6

From a short test i did, i've found that using a FileStream is just 1 millisecond slower in average than using Pete's GetFileSizeB (took me about 21 milliseconds over a network share...). Personally i prefer staying within the BCL limits whenever i can.

The code is simple:

using (var file = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
    return file.Length;
}
Alon Bar
  • 482
  • 9
  • 10
3

As per this comment:

I have a small application that gathers the size info and saves it into an array... but I often have half a million files, give or take and that takes a while to go through all of those files (I'm using FileInfo). I was just wondering if there was a faster way...

Since you're finding the length of so many files you're much more likely to benefit from parallelization than from trying to get the file size through another method. The FileInfo class should be good enough, and any improvements are likely to be small.

Parallelizing the file size requests, on the other hand, has the potential for significant improvements in speed. (Note that the degree of improvement will be largely based on your disk drive, not your processor, so results can vary greatly.)

Community
  • 1
  • 1
Servy
  • 202,030
  • 26
  • 332
  • 449
  • Actually, if he's gathering lots of files from individual directories, he may benefit from using FindFirstFile and FindNextFile() to iterate through the files in a directory, though I have no numebrs to back that up. – Pete Jan 18 '13 at 22:22
3

Not a direct answer...because I am not sure there is a faster way using the .NET framework.

Here's the code I am using:

  List<long> list = new List<long>();
  DirectoryInfo di = new DirectoryInfo("C:\\Program Files");
  FileInfo[] fiArray = di.GetFiles("*", SearchOption.AllDirectories);
  foreach (FileInfo f in fiArray)
    list.Add(f.Length);

Running that, it took 2709ms to run on my "Program Files" directory, which was around 22720 files. That's no slouch by any means. Furthermore, when I put *.txt as a filter for the first parameter of the GetFiles method, it cut the time down drastically to 461ms.

A lot of this will depend on how fast your hard drive is, but I really don't think that FileInfo is killing performance.

NOTE: I thikn this only valid for .NET 4+

Jeff Johnson
  • 1,095
  • 10
  • 15
  • 2
    How is this relevant? You're using the very methods the OP doesn't want to use. – aqua May 25 '13 at 01:28
  • 1
    @aqua It's relevant because it showcases that using FileInfo probably isn't going to drastically decrease performance. – Jeff Johnson May 28 '13 at 13:22
  • 2
    That's when dealing with local files. When dealing with network files using `FileInfo` is slow. There is a codeproject `FastFileInfo` that addresses this. – Loathing Aug 24 '15 at 03:07
  • @Loathing Thank you so much! That FastFileInfo was helping me, as I was getting 23.5k FileInfos through network, and it took about 20mins. Now it only takes 1:09 min O.o! This is amazing! – Keenora Fluffball Dec 17 '20 at 10:24
  • @KeenoraFluffball Good stuff. There is a recursion exception with the one on codeproject. I rewrote it and put it on sourceforge.net, and the sourceforge version should also be a little faster. – Loathing Dec 17 '20 at 21:53
1

A quick'n'dirty solution if you want to do this on the .NET Core or Mono runtimes on non-Windows hosts:

Include the Mono.Posix.NETStandard NuGet package, then something like this...

using Mono.Unix.Native;

private long GetFileSize(string filePath)
{
    Stat stat;
    Syscall.stat(filePath, out stat);
    return stat.st_size;
}

I've tested this running .NET Core on Linux and macOS - not sure if it works on Windows - it might, given that these are POSIX syscalls under the hood (and the package is maintained by Microsoft). If not, combine with the other P/Invoke-based answer to cover all platforms.

When compared to FileInfo.Length, this gives me much more reliable results when getting the size of a file that is actively being written to by another process/thread.

Mark Beaton
  • 2,657
  • 3
  • 23
  • 33
0

You can try this:

[DllImport("kernel32.dll")]
static extern bool GetFileSizeEx(IntPtr hFile, out long lpFileSize);

But that's not much of an improvement...

Here's the example code taken from pinvoke.net:

IntPtr handle = CreateFile(
    PathString, 
    GENERIC_READ, 
    FILE_SHARE_READ, 
    0, 
    OPEN_EXISTING, 
    FILE_ATTRIBUTE_READONLY, 
    0); //PInvoked too

if (handle.ToInt32() == -1) 
{
    return; 
}

long fileSize;
bool result = GetFileSizeEx(handle, out fileSize);
if (!result) 
{
    return;
}
edeboursetty
  • 5,669
  • 2
  • 40
  • 67
  • 1
    @Venson: definitely better than yours, I'm sorry to say ;) – edeboursetty Jan 18 '13 at 21:35
  • Of Course but thats what i Said! thats just an Idea please READ before vote! – Venson Jan 18 '13 at 21:36
  • @Venson No, that's not what you said. You said they were the same. They're not. Yours is terrible, this is probably more annoying, but likely not (at least much) worse. – Servy Jan 18 '13 at 21:37
  • @sergeidave The two best methods are GetFileSizeEx() (as above) and FindFirstFile (which is what FileInfo uses). I don't know that there's any particular performance advantage to one over the other. But if performance is really critical, you should time the two methods and see which is actually faster. Using FileInfo may be as fast. – Pete Jan 18 '13 at 21:39
  • 3
    This is the same as `FileStream.Length`, just the less readable version. – Tim Schmelter Jan 18 '13 at 21:41
  • Thank you!! I will run some tests and time these options. – sergeidave Jan 18 '13 at 21:42
  • @TimSchmelter The concern of the OP is that `FileInfo` will also grab additional info; so this may find the file info just as quickly, but not not also querying for the last modified date (as an example) it might be quicker. Now, I doubt that's true (due to lazy loading) but it's at least something to address. – Servy Jan 18 '13 at 21:44
  • @Tim Schmelter - Not entirely true. FileStream.Length is not static and thus you have to instantiate and the instantiation of a FileStream does have some cost with it. – Pete Jan 18 '13 at 21:45