1

I'm trying to find some directories on a network drive.

I use Directory.EnumerateDirectories for this. The problem is that it takes very long because there are many subdirectories.

Is there a way to make the function stop searching further down into subdirectories if a match was found and carry on with the next directory on same level?

static readonly Regex RegexValidDir = new ("[0-9]{4,}\\.[0-9]+$");
var dirs = Directory.EnumerateDirectories(startDir, "*.*", SearchOption.AllDirectories)
                .Where(x => RegexValidDir.IsMatch(x));

The directory structure looks like that

a\b\20220902.1\c\d\
a\b\20220902.2\c\d\e
a\b\x\20220902.3\
a\b\x\20221004.1\c\
a\b\x\20221004.2\c\
a\b\x\20221004.3\d\e\f\
...
a\v\w\x\20221104.1\c\d
a\v\w\x\20221105.1\c\d
a\v\w\x\20221106.1\c\d
a\v\w\x\20221106.2\c\d
a\v\w\x\20221106.3\c\d
a\v\w\x\20221106.4\

I'm interested in the directories with a date in the name only and want to stop searchin further down into the subdirectories of a matching dir.

Another thing is I don't know if the search pattern I'm supplying (.) is correct for my usage szenario.

The directories are found relatively quickly, but it then takes another 11 minutes to complete the search function

BenHero
  • 323
  • 1
  • 3
  • 9
  • "The directories are found relatively quickly, but it then takes another 2-3 minutes to complete the search function" What does that mean? How do you measure case 1 and how case 2 ? – Tim Schmelter Dec 05 '22 at 08:49
  • The for loop I iterate "dirs" is enteres in a few seconds, I can see that my console.writeline of each dir is written quick. after the console.writeline loop I do an OrderBy("DateTime") and this takes 12 ! Minutes to get done. (Measured with stopwatch) – BenHero Dec 05 '22 at 10:29
  • The list contains only 23 "top" directories with "date-Directories). The one with most date-dirs has 103 entries. All in all about 500 relevant directories where found. I'm using a UNC network share as start directory in a professional GBit network environment... – BenHero Dec 05 '22 at 10:35

1 Answers1

1

I don't think that it's possible to prune the enumeration efficiently with the built-in Directory.EnumerateDirectories method, in SearchOption.AllDirectories mode. My suggestion is to write a custom recursive iterator, that allows to select the children of each individual item:

static IEnumerable<T> Traverse<T>(IEnumerable<T> source,
    Func<T, IEnumerable<T>> childrenSelector)
{
    foreach (T item in source)
    {
        IEnumerable<T> children = childrenSelector(item);
        yield return item;
        if (children is null) continue;

        foreach (T child in Traverse(children, childrenSelector))
            yield return child;
    }
}

Then for the directories that match the date pattern, you can just return null children, effectively stopping the recursion for those directories:

IEnumerable<string> query = Traverse(new[] { startDir }, path =>
{
    if (RegexValidDir.IsMatch(path)) return null; // Stop recursion
    return Directory.EnumerateDirectories(path);
}).Where(path => RegexValidDir.IsMatch(path));

This query is slightly inefficient because the RegexValidDir pattern is matched twice for each path (one in the childrenSelector and one in the predicate of the Where). In case you want to optimize it, you could modify the Traverse method by replacing the childrenSelector with a more complex lambda, that returns both the children and whether the item should be yielded by the iterator: Func<T, (IEnumerable<T>, bool)> lambda. Or alternatively use the Traverse as is, with the T being (string, bool) instead of string.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • 1
    Sounds interesting. I'll try to follow your aproach and give feedback soon. – BenHero Dec 05 '22 at 12:23
  • 1
    Thanks for your idea. Works very well. It takes only about 12 seconds to complete!! I had to add some exlude direectories but all in all your solution works great. – BenHero Dec 06 '22 at 08:29
  • @BenHero in case you are interested for a non-recursive variant of the `Traverse`, there are some implementations here: [How to flatten tree via LINQ?](https://stackoverflow.com/questions/11830174/how-to-flatten-tree-via-linq) For understanding the theory behind the transformation, this question is quite helpful: [Way to go from recursion to iteration](https://stackoverflow.com/questions/159590/way-to-go-from-recursion-to-iteration). – Theodor Zoulias Dec 07 '22 at 06:30