7

We have encountered an unexpected performance issue when traversing directories looking for files using a wildcard pattern.

We have 180 folders each containing 10,000 files. A command line search using dir <pattern> /s completes almost instantly (<0.25 second). However, from our application the same search takes between 3-4 seconds.

We initially tried using System.IO.DirectoryInfo.GetFiles() with SearchOption.AllDirectories and have now tried the Win32 API calls FindFirstFile() and FindNextFile().

Profiling our code using indicates that the vast majority of execution time is spent on these calls.

Our code is based on the following blog post:

http://codebetter.com/blogs/matthew.podwysocki/archive/2008/10/16/functional-net-fighting-friction-in-the-bcl-with-directory-getfiles.aspx

We found this to be slow so updated the GetFiles function to take a string search pattern rather than a predicate.

Can anyone shed any light on what might be wrong with our approach?

Richard Ev
  • 52,939
  • 59
  • 191
  • 278
  • what are you using to do the search from the command line? Could it be that it is using the Windows search indexes to do the query rather than stepping through every file? – Matt Breckon Nov 23 '09 at 10:29
  • @Matt we're just doing a `dir /s` (have updated my post accordingly). – Richard Ev Nov 23 '09 at 10:33
  • 1
    Sounds suspicious. I seriously doublt that "dir" uses anything else except FindFirstFile/FindNextFile as well. Maybe you misuse them. Could you provide a snippet illustrating how you use them? – sharptooth Nov 23 '09 at 10:36
  • @sharptooth: I have added a link to a post that contains the source code we used – Richard Ev Nov 23 '09 at 11:12
  • 1
    @Matt dir does not use index service – Sheng Jiang 蒋晟 Nov 23 '09 at 19:54
  • @ShengJiang蒋晟 Does [Directory.GetFiles Method](https://learn.microsoft.com/en-us/dotnet/api/system.io.directory.getfiles?redirectedfrom=MSDN&view=netframework-4.8#overloads) uses the index service? How can I know this? – fabda01 Dec 27 '19 at 05:20
  • compare with ShellSearchFolder in Windows API Code Pack and you will know. – Sheng Jiang 蒋晟 Dec 29 '19 at 18:40

3 Answers3

11

In my tests using FindFirstFileEx with FindExInfoBasic and FIND_FIRST_EX_LARGE_FETCH is much faster than the plain FindFirstFile.

Scanning 20 folders with ~300,000 files took 661 seconds with FindFirstFile and 11 seconds with FindFirstFileEx. Subsequent calls to the same folders took less than a second.

HANDLE h=FindFirstFileEx(search.c_str(), FindExInfoBasic, &data, FindExSearchNameMatch, NULL, FIND_FIRST_EX_LARGE_FETCH); 
Richard Ev
  • 52,939
  • 59
  • 191
  • 278
KPexEA
  • 16,560
  • 16
  • 61
  • 78
  • On Windows 7 x64 the difference between using `FIND_FIRST_EX_LARGE_FETCH` and not is 0x10000 versus 0x1000 bytes for the find buffer (found with IDA). `FindExInfoBasic` may be more relevant here, though. – 0xC0000022L Oct 16 '18 at 22:39
  • My measurements (W10 x64, SSD disk drive) show that FindFirstFileEx is marginally (~14%) faster than FindFirstFile. Test folder had 900K files. Enumerations took typically 1.5 sec. Except the very first enumeration, when the enumeration takes 10x longer. (For both methods, of course.) – Jan Slodicka Apr 30 '19 at 10:48
  • Have to correct myself regarding the very first enumeration. After many benchmark tests I can confirm that the first enumeration (after clearing the disk cache) is a) ~3x slower in case of FindFirstFileEx()(...FindExInfoBasic, ...FIND_FIRST_EX_LARGE_FETCH), b) ~20x slower in case of FindFirstFile(). – Jan Slodicka May 02 '19 at 07:51
3

You can try with an implementation of FindFirstFile and FindNextFile I once blogged about.

Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928
0

Try IShellFolder::EnumObjects with SHGetDataFromIDList/IShellFolder::GetAttributesOf.

Pro/Cons here.

Kevin Panko
  • 8,356
  • 19
  • 50
  • 61
Sheng Jiang 蒋晟
  • 15,125
  • 2
  • 28
  • 46