2

I need to get a list of all Word Documents. *.doc and *.docx that are stored in a Windows based folder, with many subfolders, and sub sub folders etc...

Searching for a file with C# has an answer that works, it is 2 years old and takes 10 seconds to search through 1500 files, (in the future there may be 10,000 or more). I will post my code which is basically a copy from the above link. Does anyone have a better solution?

DateTime dt = DateTime.Now;
DirectoryInfo dir = new DirectoryInfo(MainFolder);
List<FileInfo> matches = 
          new List<FileInfo>(dir.GetFiles("*.doc*",SearchOption.AllDirectories));
TimeSpan ts = DateTime.Now-dt;
MessageBox.Show(matches.Count + " matches in " + ts.TotalSeconds + " seconds");
Community
  • 1
  • 1
General Grey
  • 3,598
  • 2
  • 25
  • 32
  • Instead of List write to dictionary. Its fast. Otherwise I think you have fastest one. – Nikhil Agrawal May 15 '12 at 16:18
  • possible duplicate of [Searching for a file with C#](http://stackoverflow.com/questions/3102786/searching-for-a-file-with-c-sharp) – Robert Levy May 15 '12 at 16:18
  • 1
    @RobertLevy: He gave that link himself. – Nikhil Agrawal May 15 '12 at 16:19
  • @RobertLevy wow Read the question before commenting, I refrenced that exact link in my question – General Grey May 15 '12 at 16:19
  • 2
    Maybe this link will help you: http://stackoverflow.com/questions/7865159/retrieving-files-from-directory-that-contains-large-amount-of-files – Fabske May 15 '12 at 16:24
  • Doubt you are going to do much with that. Might be trying dir.GetFiles("*.doc|*.docx" .. though – Tony Hopkinson May 15 '12 at 16:26
  • im aware. but saying "hey I know this is a duplicate" in the question doesn't make it ok. go back to the original question and add a bounty if you want to bring it back to life – Robert Levy May 15 '12 at 16:44
  • @RobertLevy It was two years old. I did not want to bring it back to life I wanted to know if technology had changed making it obsolete. If I had already pointed out it's the same question what did you expect to gain by saying it as well. uhm hey, the sky is blue! – General Grey May 15 '12 at 16:47
  • @RobertLevy Real Mature Robert, downvote me because I made fun of you... – General Grey May 15 '12 at 16:59

4 Answers4

5

You can use Directory.EnumerateFiles instead of GetFiles. This has the advantage of returning the files as an IEnumerable<T>, which allows you to begin your processing of the result set immediately (instead of waiting for the entire list to be returned).

If you're merely counting the number of files or listing all files, it may not help. If, however, you can do your processing and/or filtering of the results, and especially if you can do any of it in other threads, it can be significantly faster.

From the documentation:

The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.

Reed Copsey
  • 554,122
  • 78
  • 1,158
  • 1,373
  • I am merely listing the file names. This sounds promising though because I can add to the list as It searches so at least the program doesn't freeze. Do I need to run this in a background worker though? – General Grey May 15 '12 at 16:34
  • @K'Leg Yes - if you want to keep the program from freezing, I'd use this to make "blocks" of files in a background thread, and then marshal a block at a time (to add to your display) back to the UI thread. (Doing it item by item will probably cause it to run even slower than it does now...) – Reed Copsey May 15 '12 at 16:35
  • while there are 1500 or so files I only expect it to return somewhere from 5 to 50, and if it reaches 200 I intend to stop the search. – General Grey May 15 '12 at 16:37
  • @K'Leg in that case, using EnumerateFiles will be FAR superior, as you can stop enumerating part way through ;) – Reed Copsey May 15 '12 at 16:43
2

Doubt there's much you can do with that,

dir.GetFiles("*.doc|*.docx", SearchOptions.AllDirectories) might have an impact in that it's more restrictive pattern.

Tony Hopkinson
  • 20,172
  • 3
  • 31
  • 39
1

If you want the full list, other than making sure the Windows Indexing Service is enable on the target folders, not really. Your main delay is going to be reading from the hard drive, and no optimizing of your C# code will make that process any faster. You could create your own simple indexing service, perhaps using a FileSystemWatcher, that would give you sub-second response times no matter how many documents are added.

Paul
  • 6,188
  • 1
  • 41
  • 63
1

In a first time I suggest you to use StopWatch instead of DateTime to get the elapsed time.
In a second time to make your search faster you shouldn't store the result of GetFiles in a List but directly into an array.
And finally, you should optimize your search pattern : you want every doc and docx file, try "*.doc?"
Here is my suggestion :

var sw = new Stopwatch();
sw.Start();

var matches = Directory.GetFiles(MainFolder, "*.doc?", SearchOption.AllDirectories);

sw.Stop();
MessageBox.Show(matches.Length + " matches in " + sw.Elapsed.TotalSeconds + " seconds");
Nicolas
  • 6,289
  • 4
  • 36
  • 51
  • I appreciate the stopwatch suggestion, as for the filter option of *.doc? it has no performance difference at all, the search seems to take the same amount of time. what does the ? do compared to the *? – General Grey May 15 '12 at 16:40
  • According to the MSDN documentation, * is for 0 or more characters; and ? is for exactly zero or one character. – Nicolas May 15 '12 at 16:43
  • Interesting, so It might boost the speed slightly, but in my case not noticeable with the number of files I am searching, thank you. – General Grey May 15 '12 at 16:50