My program needs to get the dir size with all subdirs and subfiles, and it does that by recursively enumerating all the objects in that dir and summing up the size. However, performace is unacceptable for large directories. It is longer than Windows Explorer directory Properties calculates size, and it causes the hard drive to rattle much more. So, how can I optimize this process? Is there any appropriate WinAPI function or method for this?
-
1How you're doing it is how the Properties dialog does it as far as I know. The file system will cache the directory contents so if you run your program first the Properties dialog will seem much faster because the data is already in memory. – Jonathan Potter Sep 27 '14 at 21:56
-
@JonathanPotter: I understand there is some caching involved, and I've done the experiment enough times to clearly see that even if Properties is doing the same, it's doing so in a more optimized way. And I'm not sure how to optimize it. – Violet Giraffe Sep 27 '14 at 21:59
-
There's no way to do it other than iteration, so you're doing it using the right method. Unless there's an issue with your code that's making it slower than necessary, there's no way to optimize this any further. – Ken White Sep 27 '14 at 21:59
-
If you're working with a __really__ big directory, it __may__ be faster to actually get the total used space on the particular particion and substract the summarized size of all the _other_ directiories. But that seems like a really edge case. – Paweł Stawarz Sep 27 '14 at 22:02
-
1@PawełStawarz: That would only work if you KNOW that the base of your current directory is at least a bit larger than all other directories. And I can't see that this can be known by general code. You could do this if you really know that "My partition is dedicated for this purpose, and will only contain these directories", but for a general purpose function, recursively iterating over a directory is the only way. – Mats Petersson Sep 27 '14 at 22:06
-
@MatsPetersson that's why I've said its an __edge case__. I can however see it occur - many people have partitions that only contain the system or movies collection. In those cases the substraction would be probably a better idea. – Paweł Stawarz Sep 27 '14 at 22:17
-
1A terabyte drive today takes well over a minute to iterate. So your question boils down to "how do I tell the user what he wants to know without him waiting for it?" Simple: you show him you are Working On It. The user won't be surprised. – Hans Passant Sep 27 '14 at 22:20
-
If the program is running as administrator you can avoid iteration, and significantly reduce head movement, by scanning the MFT rather than the directory tree. This means scanning every file entry on the disk, but the scan is fast (circa 20000+ entries per second). – Harry Johnston Sep 27 '14 at 23:30
-
@HansPassant: that is of course a good point, but I still wonder why Explorer is twice faster than my application at this task (triple and quadruple-checked). – Violet Giraffe Sep 28 '14 at 08:22
-
2Using GetFileInformationByHandleEx with FILE_ID_BOTH_DIR_INFO might be faster than using FindFirstFile and FindNextFile. – Harry Johnston Sep 28 '14 at 20:10
-
I'm sure Raymond Chen has written on this precise issue fairly recently – user1793036 Oct 01 '14 at 05:03
1 Answers
Iterating over the files in the directory is the only generic way, and Windows certainly doesn't have any short-cut for doing this (in regular applications that run with user-level privileges - and I wouldn't suggest that your app should require admin rights just to run!).
There MAY be a bit of a difference, if the directory contains a very large number of files, depending on whether you do depth first or breadth first recursion of the directories - the breadth first would require "saving" the directories to be searched within the current directory, which could of course also lead to problems if you have many directories, where the depth-first method doesn't need any storage, but means that the OS will have more directories open at once - and probably make more head movements. However, it is LIKELY very much a marginal difference. For large filesystems, "how much space is used" it could make a difference - I haven't actually tried.

- 126,704
- 14
- 140
- 227
-
I'd disagree on the opinion, that *"Windows certainly doesn't have any short-cut for doing this"*. Directly accessing the [Master File Table (MFT)](http://msdn.microsoft.com/en-us/library/windows/desktop/aa365230.aspx) is very likely a **lot** faster than iterating over files in a directory. – IInspectable Sep 30 '14 at 07:10
-
@IInspectable: But only applications that run with raised privileges can do that. I will amend "for regular applications". – Mats Petersson Sep 30 '14 at 07:14
-
@IInspectable: From what I can determine, the MFT contains everything but the directory in which the file is located. That makes the MFT method only suitable for the root directory. – MSalters Sep 30 '14 at 07:32
-
@MSalters: Directory index information is available as well and can be used for filtering MFT records. – IInspectable Sep 30 '14 at 12:21