2

I'm trying to get a list of files in a specific directory that contains over 20 million files ranging from 2 to 20 KB each.
The problem is that my program throws the Out Of Memory Exception everytime, while tools like robocopy are doing a good job copying the folder to another directory with no problem at all. Here's the code I'm using to enumerate files:

            List<string> files = new List<string>(Directory.EnumerateFiles(searchDir));

What should I do to solve this problem? Any help would be appreciated.

Jeremy Mc
  • 119
  • 1
  • 10
  • 1
    Don't create a list of the files. Just iterate over the result of `EnumerateFiles` and do whatever it is you want to do. – juharr Sep 28 '16 at 16:51
  • Are you trying to hold that much data in memory?On way you can do is create sub directory and break it into groups. – Rohit Sep 28 '16 at 16:52
  • @Rohit Yes. I was trying to create a list, then iterate over them and do some processing. – Jeremy Mc Sep 28 '16 at 16:53
  • @juharr I'm going to try it right now and let you know of the result – Jeremy Mc Sep 28 '16 at 16:53
  • @JeremyMc -- What happens if you change `Directory.EnumerateFiles` to `Directory.GetFiles`? – rory.ap Sep 28 '16 at 16:53
  • Possible duplicate of [Retrieving files from directory that contains large amount of files](http://stackoverflow.com/questions/7865159/retrieving-files-from-directory-that-contains-large-amount-of-files) – Rohit Sep 28 '16 at 16:54
  • 1
    @JeremyMc Would need to see more code to determine if there are any other potential memory issues. – juharr Sep 28 '16 at 16:54
  • 2
    @rory.ap That would be even worse as it would return an array of the files then create a list from that array, thus doubling the amount of memory used. – juharr Sep 28 '16 at 16:55
  • @juharr Thanks, It worked perfectly. Habib wrote it as a post & I marked it as the answer. – Jeremy Mc Oct 01 '16 at 06:00

2 Answers2

9

You are creating a list of 20 million object in memory. I don't think you will ever use that, even if it become possible.

Instead use to Directory.EnumerateFiles(searchDir) and iterate each item one by one.

like:

foreach(var file in Directory.EnumerateFiles(searchDir))
{
   //Copy to other location, or other stuff
}

With your current code, your program will have 20 million objects first loaded up in memory and then you have to iterate, or perform operations on them.

See: Directory.EnumerateFiles Method (String)

The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.

Habib
  • 219,104
  • 29
  • 407
  • 436
  • Isn't that going to run into the same problem? – rory.ap Sep 28 '16 at 16:54
  • 3
    @rory.ap, not it will not. This will not load up the 20 million files path in memory, instead it will be one object *(string path)* at a time in memory – Habib Sep 28 '16 at 16:55
  • 3
    @GillBates, no. Enumeration, doesn't mean returning collection. This will do lazy evaluation. Just like `File.ReadLine` vs `File.ReadAllLines`. – Habib Sep 28 '16 at 16:58
1

The answer above covers one directory level. To be able to enumerate through multiple levels of directories, each having a large number of directories with a large number of files, one can do the following:

public IEnumerable<string> EnumerateFiles(string startingDirectoryPath) {
    var directoryEnumerables = new Queue<IEnumerable<string>>();
    directoryEnumerables.Enqueue(new string[] { startingDirectoryPath });
    while (directoryEnumerables.Any()) {
        var currentDirectoryEnumerable = directoryEnumerables.Dequeue();
        foreach (var directory in currentDirectoryEnumerable) {
            foreach (var filePath in EnumerateFiles(directory)) {
                yield return filePath;
            }
            directoryEnumerables.Enqueue(Directory.EnumerateDirectories(directory));
        }                
    }
}

The function will traverse a collection of directories through enumerators, so it will load the directory contents one by one. The only thing left to solve is the depth of the hierarchy...

Karatheodory
  • 895
  • 10
  • 16