2

I am looking for a method that will take a file extension type and directory and return all the files within this directory and sub directories ordered by the latest creation date, i.e. latest files first.

So far i have identified the following method which is meant to be fast however is there a better way of doing this and i need it to return FileInfo rather than a string and ordered as described above.

public static IEnumerable<string> GetFileList(string fileSearchPattern, string rootFolderPath)
{
Queue<string> pending = new Queue<string>();
pending.Enqueue(rootFolderPath);
string[] tmp;
while (pending.Count > 0)
{
    rootFolderPath = pending.Dequeue();
    tmp = Directory.GetFiles(rootFolderPath, fileSearchPattern);
    for (int i = 0; i < tmp.Length; i++)
    {
        yield return tmp[i];
    }
    tmp = Directory.GetDirectories(rootFolderPath);
    for (int i = 0; i < tmp.Length; i++)
    {
        pending.Enqueue(tmp[i]);
    }
}
}
Tommy
  • 445
  • 1
  • 6
  • 15

3 Answers3

1

When I have researched this problem space I've found there isn't a fast way to do this. The reason is no matter what approach you take, you end up having to go to the Operating System for the list of files in a directory. And the file system doesn't cache / index the way a search engine would. So you end up need to recrawl the file system yourself.

Once you have the raw information, however, you can index it yourself.

Philip Pittle
  • 11,821
  • 8
  • 59
  • 123
1

The below will work for your purposes. You want to use Directory.EnumerateFiles(...) to allow the file list to use less memory up front. It will only go looking for the next element when you ask for it instead of loading the entire collection in to memory at the start.

Directory.EnumerateFiles(rootFolderPath, fileSearchPattern, System.IO.SearchOption.AllDirectories).OrderBy(file => new FileInfo(file).CreationTime)

One additional consideration. Since you are doing a fairly blind search through the file system, if you try to enumerate a file and an exception is thrown, it will invalidate the enumerator causing it to exit without finishing. I have posted a solution to that problem here

Community
  • 1
  • 1
Matthew Brubaker
  • 3,097
  • 1
  • 21
  • 18
  • Is lazy loading going to help in this regard? The OP is trying to get all the files anyway, so deferred execution wont engender a performance boost here. – Philip Pittle Aug 22 '14 at 14:16
  • Lazy loading is probably not the correct wording. Using EnumerateFiles(...) uses a smaller memory footprint and returns faster by only actually going looking once you actually ask for the next element. I have clarified my answer to indicate this. – Matthew Brubaker Aug 22 '14 at 14:19
0

Directory.GetFiles does have an option to search recursively.

The following should work, although I haven't tried it.

    IEnumerable<FileInfo> GetFileList(string directory, string extension)
    {
        return Directory.GetFiles(directory, "*" + extension, SearchOption.AllDirectories)
            .Select(f => new FileInfo(f))
            .OrderByDescending(f => f.CreationTime);
    } 
Kenny Hung
  • 442
  • 3
  • 10
  • This is an inefficient solution because Directory.GetFiles() will wait until it has loaded all the files before returning. you should use Directory.EnumerateFiles(...) instead. – Matthew Brubaker Aug 22 '14 at 14:13
  • Normally, I'd agree, but we're ordering the files by creation time, so we probably have to load all the files first anyway. – Kenny Hung Aug 22 '14 at 14:18
  • Why? The file creation time isn't going to change and if you have a particularly large folder structure, you risk getting an out of memory error by loading that many strings into memory. The LINQ OrderBy method won't evaluate until you ask for the next element anyway. – Matthew Brubaker Aug 22 '14 at 14:22
  • Ah: you've put your FileInfo creation in the OrderBy: that is, indeed, going to save you memory space. Unfortunately, the question asks for the FileInfo back, hence my solution: however, if loading the large file structure into memory as a FileInfo is a problem, then your Matthew's solution would avoid that, but you would probably need to construct the FileInfo twice. – Kenny Hung Aug 23 '14 at 21:59
  • You can still put the .Select() in there, that part isn't what is saving you memory. The major memory savings come from using Directory.EnumerateFiles(...) instead of Directory.GetFiles(...). Directory.GetFiles(...) will create a string instance for every file in the collection immediately upon execution. Directory.EnumerateFiles(...) will instead only create a string instance when you ask for the 'next' file. – Matthew Brubaker Aug 25 '14 at 16:29
  • I know that the Select() isn't saving memory: but the OrderBy will need to evaluate new FileInfo(file).CreationTime for each file, so you will need to store the creation time and filename for each file in your answer, and then re-evaluate new FileInfo(file) for each file as you enumerate through your resulting IEnumerable. – Kenny Hung Aug 27 '14 at 20:16
  • In my solution, you would need to store filenames, their corresponding FileInfo instances, and their creation times for each file (hence using up more memory), but have the payback of constructing a new FileInfo(file) once for each file. You're right, though: if I used EnumerateFiles instead of GetFiles in my answer, I would only have to store all the FileInfo instances, and one filename at a time, so getting memory savings that way. However, lazy loading like this seldom comes for free: GetFiles followed by ordering is probably a tiny bit faster than EnumerateFiles followed by ordering. – Kenny Hung Aug 27 '14 at 20:24