38

Not sure how to call it, but say you have a class that looks like this:

class Person
{
    public string Name;
    public IEnumerable<Person> Friends;
}

You then have a person and you want to "unroll" this structure recursively so you end up with a single list of all people without duplicates.

How would you do this? I have already made something that seems to be working, but I am curious to see how others would do it and especially if there is something built-in to Linq you can use in a clever way to solve this little problem :)


Here is my solution:

public static IEnumerable<T> SelectRecursive<T>(this IEnumerable<T> subjects, Func<T, IEnumerable<T>> selector)
{
    // Stop if subjects are null or empty
    if(subjects == null)
        yield break;

    // For each subject
    foreach(var subject in subjects)
    {
        // Yield it
        yield return subject;

        // Then yield all its decendants
        foreach (var decendant in SelectRecursive(selector(subject), selector))
            yield return decendant;
    }
}

Would be used something like this:

var people = somePerson.SelectRecursive(x => x.Friends);
John Saunders
  • 160,644
  • 26
  • 247
  • 397
Svish
  • 152,914
  • 173
  • 462
  • 620
  • I'm missing something... if you have loops there, will it ever stop? – Kobi Jan 06 '10 at 10:44
  • @Kobi: This is done by `if(!subjects.Any()) yield break;` – Oliver Jan 06 '10 at 11:03
  • @Oliver: No it won't. That will only stop if the subjects list is empty. So I guess I could actually have skipped that part totally since it won't make any difference... – Svish Jan 06 '10 at 11:21
  • 1
    @Kobi: Nope, you are not missing anything. It will never stop :p The stuff I was working with when I made it would never have any cycles so didn't bother doing anything about it. If I needed to, I would probably use a HashSet to keep track of the subjects I had already visited. – Svish Jan 06 '10 at 11:22
  • Removed the !subjects.Any() part, as it wasn't really do any good and just confuses :p – Svish Jan 06 '10 at 13:27

7 Answers7

44

I don't believe there's anything built into LINQ to do this.

There's a problem with doing it recursively like this - you end up creating a large number of iterators. This can be quite inefficient if the tree is deep. Wes Dyer and Eric Lippert have both blogged about this.

You can remove this inefficiency by removing the direct recursion. For example:

public static IEnumerable<T> SelectRecursive<T>(this IEnumerable<T> subjects,
    Func<T, IEnumerable<T>> selector)
{
    if (subjects == null)
    {
        yield break;
    }

    Queue<T> stillToProcess = new Queue<T>(subjects);

    while (stillToProcess.Count > 0)
    {
        T item = stillToProcess.Dequeue();
        yield return item;
        foreach (T child in selector(item))
        {
            stillToProcess.Enqueue(child);
        }
    }
}

This will also change the iteration order - it becomes breadth-first instead of depth-first; rewriting it to still be depth-first is tricky. I've also changed it to not use Any() - this revised version won't evaluate any sequence more than once, which can be handy in some scenarios. This does have one problem, mind you - it will take more memory, due to the queuing. We could probably alleviate this by storing a queue of iterators instead of items, but I'm not sure offhand... it would certainly be more complicated.

One point to note (also noted by ChrisW while I was looking up the blog posts :) - if you have any cycles in your friends list (i.e. if A has B, and B has A) then you'll recurse forever.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • The cyclicity problem can easily be fixed by associating a `visited` flag with each node, of course. – Will Vousden Jan 06 '10 at 10:53
  • 3
    @Inquisitor: Only if the type is mutable. Otherwise you could use a `HashSet` to store items you'd already visited. – Jon Skeet Jan 06 '10 at 10:55
  • That's exactly the cleverness I was looking for, hehe. Will probably use this instead and maybe add the HashSet to prevent infinite cycles, just for cleaner and safer code. Thanks! – Svish Jan 06 '10 at 11:24
  • @Inquisitor: You will still have the problem of how to clear the visited flag before starting iteration. Looks like this will require one to visit each node, just to clear this visited flag. Chicken and egg situation. – Tarydon Jan 06 '10 at 12:23
  • You could just clear the visited flag before you visit each node. No wait... :p – Svish Jan 06 '10 at 12:32
  • The iteration order doesn't matter since the the goal is just to get them all. – Svish Jan 06 '10 at 13:28
  • @Jon Skeet: Would I be correct to assume that the `if(!visitedSubjects.Add(item)) continue;` (where `visitedSubjects` would be a `HashSet`) should be just before the `yield return item`? – Svish Jan 06 '10 at 13:33
  • 1
    Why is it hard to do it depth-first? Don't you just replace the queue with a stack? – Eric Lippert Jan 06 '10 at 21:12
  • 3
    @Eric: That's very possible... although you then get it depth first *and last first within each collection* so it still doesn't match the original order :( Again, I'm sure it could with a bit more effort - but my brain isn't up to thinking it through at the moment. – Jon Skeet Jan 06 '10 at 21:43
  • 2
    Ah, yes, I see EXACTLY what you mean. Interesting coincidence, I was just now examining the recursive algorithm we use to determine what order classes are emitted, and wondering if it could be made iterative. Making this algorithm iterative has exactly this problem; it is not exactly depth-first because then that reverses the order in which classes in a given namespace are emitted. It should be pretty easy to fix it up with judicious use of the Reverse() sequence operator. – Eric Lippert Jan 06 '10 at 22:43
  • 1
    Actually, there are simpler solutions for the "visited" flag - like using a GUID (which is generated on each unroll) or using an integer which is incremented at each unroll. Then you don't check whether the "visited" flag is set; you check if it is set to the value of this specific unroll. However, this solution still has a problem if you're using multithreading and two threads want to unroll at the same time... – Vilx- Jan 07 '10 at 10:53
  • 1
    @Eric: This is an older question but I have a solution that preserves the order. I realized today as I was puzzling over this for my own project - just push the enumerator itself on a stack. See my answer (added for posterity). – Kevin Brock Jan 12 '12 at 02:52
  • What if I want to get the depth level or tree level of the node that I'm iterating on? – ebram khalil Jun 15 '14 at 15:03
  • @ebramtharwat One solution is to push a Tuple and when pushing new element, you take the current int + 1 and insert that ... –  Apr 10 '16 at 15:09
  • @EricLippert, @JonSkeet, a `Stack` should work where all iterations adding new elements are done with `.Reverse()` ? –  Apr 11 '16 at 08:14
  • @BjörnAliGöransson: Possibly, although using `Reverse()` ends up evaluating the sequence completely, which loses various benefits. – Jon Skeet Apr 11 '16 at 08:17
  • 1
    @JonSkeet Then, a Stack of Queues could be a better trade-off. –  Apr 11 '16 at 09:21
  • Isn't the sequence already being evaluated completely due to `new Queue(subjects)` ? – Chronicle Mar 04 '20 at 20:08
  • 1
    @Chronicle: That's evaluating the top-level sequence, yes... but then it lazily processes each of them one at a time to retrieve the lower ones. I'd have to look a bit more carefully - when I'm not sleepy at the difference in terms of the number of iterators, but I still believe this removes a lot of the inefficiency. – Jon Skeet Mar 05 '20 at 13:59
15

I found this question as I was looking for and thinking about a similar solution - in my case creating an efficient IEnumerable<Control> for ASP.NET UI controls. The recursive yield I had is fast but I knew that could have extra cost, since the deeper the control structure the longer it could take. Now I know this is O(n log n).

The solution given here provides some answer but, as discussed in the comments, it does change the order (which the OP did not care about). I realized that to preserve the order as given by the OP and as I needed, neither a simple Queue (as Jon used) nor Stack would work since all the parent objects would be yielded first and then any children after them (or vice-versa).

To resolve this and preserve the order I realized the solution would simply be to put the Enumerator itself on a Stack. To use the OPs original question it would look like this:

public static IEnumerable<T> SelectRecursive<T>(this IEnumerable<T> subjects, Func<T, IEnumerable<T>> selector)
{
    if (subjects == null)
        yield break;

    var stack = new Stack<IEnumerator<T>>();

    stack.Push(subjects.GetEnumerator());

    while (stack.Count > 0)
    {
        var en = stack.Peek();
        if (en.MoveNext())
        {
            var subject = en.Current;
            yield return subject;

            stack.Push(selector(subject).GetEnumerator());
        }
        else 
        {
            stack.Pop().Dispose();
        }
    }
}

I use stack.Peek here to keep from having to push the same enumerator back on to the stack as this is likely to be the more frequent operation, expecting that enumerator to provide more than one item.

This creates the same number of enumerators as in the recursive version but will likely be fewer new objects than putting all the subjects in a queue or stack and continuing to add any descendant subjects. This is O(n) time as each enumerator stands on its own (in the recursive version an implicit call to one MoveNext executes MoveNext on the child enumerators to the current depth in the recursion stack).

Tok'
  • 565
  • 8
  • 11
Kevin Brock
  • 8,874
  • 1
  • 33
  • 37
  • 4
    You should dispose the enumerator after you pop it from the stack. – Logerfo Jul 17 '17 at 20:35
  • 1
    NOTE: *"put the Enumerator itself on a Stack"* - this works, but the cost of this is creating a lot of enumerators, one per recursed node. Contrast this with Jon's solution, which does not create the same numeration order, but avoids `GetEnumerator` calls on all the descendents. One optimization is to have node (subject) class implement `ICollection` so can do `if (node.Count > 0) stack.Push(selector(node).GetEnumerator());` This avoids creating enumerator on all "leaf" nodes. – ToolmakerSteve Nov 20 '21 at 02:52
  • @ToolmakerSteve `foreach` uses `GetEnumerator` (compiler generated); and `new Queue(IEnumerable)` itself uses `foreach` to enqueue each element so I think these both have the same number of `GetEnumerator` calls. – Kevin Brock Mar 16 '22 at 20:01
  • @KevinBrock - the important part of my comment is the if-test. This avoids doing any work (don’t do either foreach or enumerator) on each leaf (which have no sub-nodes; don’t need enumerating). – ToolmakerSteve Mar 19 '22 at 20:39
3

You could use a non-recursive method like this as well:

  HashSet<Person> GatherAll (Person p) {
     Stack<Person> todo = new Stack<Person> ();
     HashSet<Person> results = new HashSet<Person> ();
     todo.Add (p); results.Add (p);
     while (todo.Count > 0) {
        Person p = todo.Pop (); 
        foreach (Person f in p.Friends) 
           if (results.Add (f)) todo.Add (f);
     }
     return results;
  }

This should handle cycles properly as well. I am starting with a single person, but you could easily expand this to start with a list of persons.

Tarydon
  • 5,097
  • 23
  • 24
2

Here's an implementation that:

  • Does a depth first recursive select,
  • Doesn't require double iteration of the child collections,
  • Doesn't use intermediate collections for the selected elements,
  • Doesn't handle cycles,
  • Can do it backwards.

    public static IEnumerable<T> SelectRecursive<T>(this IEnumerable<T> rootItems, Func<T, IEnumerable<T>> selector)
    {
        return new RecursiveEnumerable<T>(rootItems, selector, false);
    }
    
    public static IEnumerable<T> SelectRecursiveReverse<T>(this IEnumerable<T> rootItems, Func<T, IEnumerable<T>> selector)
    {
        return new RecursiveEnumerable<T>(rootItems, selector, true);
    }
    
    class RecursiveEnumerable<T> : IEnumerable<T>
    {
        public RecursiveEnumerable(IEnumerable<T> rootItems, Func<T, IEnumerable<T>> selector, bool reverse)
        {
            _rootItems = rootItems;
            _selector = selector;
            _reverse = reverse;
        }
    
        IEnumerable<T> _rootItems;
        Func<T, IEnumerable<T>> _selector;
        bool _reverse;
    
        public IEnumerator<T> GetEnumerator()
        {
            return new Enumerator(this);
        }
    
        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
    
        class Enumerator : IEnumerator<T>
        {
            public Enumerator(RecursiveEnumerable<T> owner)
            {
                _owner = owner;
                Reset();
            }
    
            RecursiveEnumerable<T> _owner;
            T _current;
            Stack<IEnumerator<T>> _stack = new Stack<IEnumerator<T>>();
    
    
            public T Current
            {
                get 
                {
                    if (_stack == null || _stack.Count == 0)
                        throw new InvalidOperationException();
                    return _current; 
                }
            }
    
            public void Dispose()
            {
                _current = default(T);
                if (_stack != null)
                {
                    while (_stack.Count > 0)
                    {
                        _stack.Pop().Dispose();
                    }
                    _stack = null;
                }
            }
    
            object System.Collections.IEnumerator.Current
            {
                get { return Current; }
            }
    
            public bool MoveNext()
            {
                if (_owner._reverse)
                    return MoveReverse();
                else
                    return MoveForward();
            }
    
            public bool MoveForward()
            {
                // First time?
                if (_stack == null)
                {
                    // Setup stack
                    _stack = new Stack<IEnumerator<T>>();
    
                    // Start with the root items
                    _stack.Push(_owner._rootItems.GetEnumerator());
                }
    
                // Process enumerators on the stack
                while (_stack.Count > 0)
                {
                    // Get the current one
                    var se = _stack.Peek();
    
                    // Next please...
                    if (se.MoveNext())
                    {
                        // Store it
                        _current = se.Current;
    
                        // Get child items
                        var childItems = _owner._selector(_current);
                        if (childItems != null)
                        {
                            _stack.Push(childItems.GetEnumerator());
                        }
    
                        return true;
                    }
    
                    // Finished with the enumerator
                    se.Dispose();
                    _stack.Pop();
                }
    
                // Finished!
                return false;
            }
    
            public bool MoveReverse()
            {
                // First time?
                if (_stack == null)
                {
                    // Setup stack
                    _stack = new Stack<IEnumerator<T>>();
    
                    // Start with the root items
                    _stack.Push(_owner._rootItems.Reverse().GetEnumerator());
                }
    
                // Process enumerators on the stack
                while (_stack.Count > 0)
                {
                    // Get the current one
                    var se = _stack.Peek();
    
                    // Next please...
                    if (se.MoveNext())
                    {
                        // Get child items
                        var childItems = _owner._selector(se.Current);
                        if (childItems != null)
                        {
                            _stack.Push(childItems.Reverse().GetEnumerator());
                            continue;
                        }
    
                        // Store it
                        _current = se.Current;
                        return true;
                    }
    
                    // Finished with the enumerator
                    se.Dispose();
                    _stack.Pop();
    
                    if (_stack.Count > 0)
                    {
                        _current = _stack.Peek().Current;
                        return true;
                    }
                }
    
                // Finished!
                return false;
            }
    
            public void Reset()
            {
                Dispose();
            }
        }
    }
    
Brad Robinson
  • 44,114
  • 19
  • 59
  • 88
1

Recursion is always fun. Perhaps you could simplify your code to:

public static IEnumerable<T> SelectRecursive<T>(this IEnumerable<T> subjects, Func<T, IEnumerable<T>> selector) {
    // Stop if subjects are null or empty 
    if (subjects == null || !subjects.Any())
        return Enumerable.Empty<T>();

    // Gather a list of all (selected) child elements of all subjects
    var subjectChildren = subjects.SelectMany(selector);

    // Jump into the recursion for each of the child elements
    var recursiveChildren = SelectRecursive(subjectChildren, selector);

    // Combine the subjects with all of their (recursive child elements).
    // The union will remove any direct parent-child duplicates.
    // Endless loops due to circular references are however still possible.
    return subjects.Union(recursiveChildren);
}

It will result in less duplicates than your original code. However their might still be duplicates causing an endless loop, the union will only prevent direct parent(s)-child(s) duplicates.

And the order of the items will be different from yours :)

Edit: Changed the last line of code to three statements and added a bit more documentation.

Yvo
  • 18,681
  • 11
  • 71
  • 90
  • Interesting... a bit unreadable though, hehe. Ordering doesn't really matter btw, so don't worry about that :p – Svish Jan 06 '10 at 13:29
  • I've split the single statement into subresults, that might make it a bit easier to read/understand. Basicly I've replaced your for-loops with LINQ. Of course you could go wild, and reduce this method to a single line statement :) – Yvo Jan 07 '10 at 10:45
1

use the Aggregate extension...

    List<Person> persons = GetPersons(); 
    List<Person> result = new List<Person>(); 
    persons.Aggregate(result,SomeFunc);

    private static List<Person> SomeFunc(List<Person> arg1,Person arg2)
    {
    arg1.Add(arg2)
    arg1.AddRange(arg2.Persons);
    return arg1;
    }
Chen Kinnrot
  • 20,609
  • 17
  • 79
  • 141
  • I was actually thinking about that a while ago. Care to make some example code? – Svish Jan 06 '10 at 13:30
  • Interesting. This wouldn't handle cyclic relations though, would it? – Svish Jan 06 '10 at 14:36
  • you can add a simple if(arg1.Containes(arg2)) – Chen Kinnrot Aug 23 '10 at 06:53
  • Unless I'm reading the code wrong, this only goes down one level - it doesn't recurse to an arbitrary depth. I believe its equivalent to `foreach (var person in persons) { result.Add(person); result.AddRange(person.Persons); }` – ToolmakerSteve Nov 20 '21 at 03:05
0

While its great to have IEnumerable when there might be a lot of data, its worth remembering the classic approach of recursively adding to a list.

That can be as simple as this (I've left out selector; just demonstrating recursively adding to an output list):

class Node
{
    public readonly List<Node> Children = new List<Node>();

    public List<Node> Flatten()
    {
        var all = new List<Node>();
        Flatten(ref all);
        return all;
    }

    public void Flatten(List<Node> all)
    {
        all.Add(this);

        foreach (var child in Children)
            child.Flatten(all);
    }
}

usage:

Node rootNode = ...;
...
var all = rootNode.Flatten();
ToolmakerSteve
  • 18,547
  • 14
  • 94
  • 196
  • I am a bit baffled why a SO user with a high reputation suggests to use `ref` when `ref` is not needed here at all. `ref` gives the `Flatten` method the freedom [to change the argument for something else](https://stackoverflow.com/a/186907/3936440) which is clearly not used here. – ViRuSTriNiTy Mar 05 '22 at 15:42
  • 1
    @ViRuSTriNiTy - Thanks - you are correct. I don't remember why I used `ref`, but it certainly is not needed here. (Maybe I adapted from code that allowed "null" param, and in that case replaced it with a new instance.) Fixed. – ToolmakerSteve Mar 11 '22 at 18:05