5

Here is the code I’m using:

using (StreamWriter output = new StreamWriter(Path.Combine(masterdestination, "Master.txt")))
{
     string masterfolders = sourcefolder1;
     string[] filess = Directory.GetFiles(masterfolders, "*.txt");
     foreach (string file in filess)
     {
        output.WriteLine(Path.GetFileName(file));
     }
}  

This code will search for all files in a user specified directory for any txt file. These directories sometimes contain 2million files.

monitoring this process while it's running I’ve seen it climb up to 800MB memory usage. Is there a way I can preserve the speed of this process and limit the memory it uses? Or have it read and dump and continue? Hashtable? Any idea's would be awesome.

Greg
  • 16,540
  • 9
  • 51
  • 97

5 Answers5

15

Directory.GetFiles really sucks. If you can use .NET 4.0 you should look into using Directory.EnumerateFiles. From the docs:

The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.

RichardOD
  • 28,883
  • 9
  • 61
  • 81
1

If you are implementing search then I suggest you use Windows Search 4.0

Giorgi
  • 30,270
  • 13
  • 89
  • 125
1

If you cannot use Fx4 you are best of to write your own FileEnumerator. Here is one example.

H H
  • 263,252
  • 30
  • 330
  • 514
  • +1. I was going to suggest something like this as an alternative. I think CodeProject also have something similar. – RichardOD Dec 28 '09 at 19:09
1

Directory.GetFiles has to build a list of all the matching files before it can return. Only then can you enumerate them. So of course, it is expensive when there are lots of matching files. It may even build a list of all files internally.

If you can use .NET 4.0, then you could use Directory.EnumerateFiles which avoids this problem by returing one file at a time. If you can't, then I would suggest you write this in C++ rather than C#.

In C++ you can use FindFirstFile which also returns the files to you one at at time.

// iterate though the files in this directory
//
TCHAR szWild[MAX_PATH];
PathCombine(szWild, masterfolders, _T("*.txt"));

WIN32_FIND_DATA fd;
HANDLE hFind = FindFirstFile(szWild, &fd);
if (INVALID_HANDLE_VALUE != hFind)
{
   do {
   TCHAR szFileName[MAX_PATH];
   PathCombine(szFileName, masterfolders, fd.cFileName);

   // write szFilename to output stream..

   } while (FindNextFile(hFind, &fd));

   FindClose (hFind);
}
John Knoeller
  • 33,512
  • 4
  • 61
  • 92
  • Not sure what TCHAR and WIN32_FInD-Data are using or are references. –  Dec 28 '09 at 19:17
  • This is pure Win32 C++ code. `#include "Windows.h"` for most of the types and prototypes. PathCombine comes from Shlwapi.h if I remember correctly. – John Knoeller Dec 28 '09 at 19:47
0

As mentioned in the answer here if using .NET 4.0, you can use the static EnumerateFiles method on the Directory class to get an IEnumerable<string> instead of an string[], which is leading to all the memory consumption.

If you are working with a version of .NET before .NET 4.0, you can easily mimic this functionality by calling the FindFirstFileEx, FindNextFile, etc, etc, methods through the P/Invoke layer.

Then, for every file that is returned from a call to FindFirstFile/FindNextFile you would yield return the item.

This will cut down on the memory consumption as EnumerateFiles would for directories with large numbers of files because you aren't loading them all into an array up front, but yielding them for processing as you find them.

Community
  • 1
  • 1
casperOne
  • 73,706
  • 19
  • 184
  • 253