Logical reduction of recursive enumeration

Question

I find myself regularly writing recursive IEnumerable<T> iterators to implement the same "Descendants" pattern as provided by, for example, XContainer.Descendants. The pattern I keep implementing is as follows, given a type Foo with a single-level iterator called Children:

public static IEnumerable<Foo> Descendants(this Foo root) {
    foreach (var child in root.Children()) {
        yield return child;
        foreach (var subchild in child.Descendants()) {
            yield return subchild;
        }
    }
}

This old StackOverflow question suggests the same pattern. But for some reason it feels weird to me to have to reference three levels of heirarchy (root, child, and subchild). Can this fundamental depth-first recursion pattern be further reduced? Or is this an algorithmic primitive of sorts?

The best I can come up with is to abstract the pattern to a generic extension. This doesn't reduce the logic of the iterator pattern presented above, but it does remove the requirement of defining a Descendants method for multiple specific classes. On the downside, this adds an extension method to Object itself, which is a little smelly:

public static IEnumerable<T> SelectRecurse<T>(
    this T root, Func<T, IEnumerable<T>> enumerator) {

    foreach (T item in enumerator(root))
    {
        yield return item;
        foreach (T subitem in item.SelectRecurse(enumerator))
        {
            yield return subitem;
        }
    }
}

// Now we can just write:
foreach(var item in foo.SelectRecurse(f => f.Children())) { /* do stuff */ }

Recursive means a function which calls itself. In the first example, take out the two lines with subchild, and replace the remaining yield line with: yield return child.Descendants(); — andrewpm, Oct 18 '13 at 13:33
@andrewpm Woops, you're right -- sort of. You cant `yield return child.Descendants()` itself, because that would yield-return an `IEnumerable`, not a `T`. But I do need to call `child.Descendants()`, not `child.Children()`. — Joshua Honig, Oct 18 '13 at 13:47
Ah I was just trying out some code and came to the same conclusion! Will post a full answer in a moment. — andrewpm, Oct 18 '13 at 13:55

Servy · Accepted Answer · 2013-10-18T16:36:27.403

3

You can use an explicit stack, rather than implicitly using the thread's call stack, to store the data that you are using. This can even be generalized to a Traverse method that just accepts a delegate to represent the "get my children" call:

public static IEnumerable<T> Traverse<T>(
    this IEnumerable<T> source
    , Func<T, IEnumerable<T>> childrenSelector)
{
    var stack = new Stack<T>(source);
    while (stack.Any())
    {
        var next = stack.Pop();
        yield return next;
        foreach (var child in childrenSelector(next))
            stack.Push(child);
    }
}

Because this isn't recursive, and thus isn't creating the state machines constantly, it will perform quite a bit better.

Side note, if you want a Breath First Search just use a Queue instead of a Stack. If you want a Best First Search use a priority queue.

To ensure that siblings are returned in the same order as they are returned from the selecor's order, rather than the reverse, just add a Reverse call to the result of childrenSelector.

edited Oct 18 '13 at 16:36

answered Oct 18 '13 at 14:47

Servy

202,030
26
332
449

Similar to Damien_The_Unbeliever's solution, but this takes explicit advantage of a linked-list type collection (in this case the one-way linked-list `Stack` type). – Joshua Honig Oct 18 '13 at 14:50
1

@JoshuaHonig `Stack` is implemented under the hood as an array, not a linked list, which will perform quite a bit better in this context. Also, a one way linked list would be quite bad here, as you don't have a good way to constantly remove from the end; it would need to be at least a two way linked list so that you could work backwards through the collection. – Servy Oct 18 '13 at 14:51
1

I think this is probably the way to go - but if there's some logical ordering to the children, it's worth noting that the children are enumerated in reverse order. – Damien_The_Unbeliever Oct 18 '13 at 14:57
One detraction: By requiring the input to already be an `IEnumerable`, you force the caller to double-call the loop herself: `foo.Children().Traverse(f => f.Children())`. Can you alter the algorithm such that `this` is simply a `T`? The desired resulting syntax would be simply `foo.Traverse(f => f.Children())`, as with my `SelectRecurse`. – Joshua Honig Oct 18 '13 at 14:58
@JoshuaHonig You can if you want. It's as simple as adding the one item to the stack instead of adding a sequence. You can keep both overloads around for convenience as well. – Servy Oct 18 '13 at 14:59
Aaah, I just noticed that you snuck in a nested enumeration in your `Stack` constructor, just as Damien_The_Unbeliever does with `AddRange`. :) Whether you call it in the `Stack` constructor or by invoking as `childSelector` is ultimately irrelevant, of course. But yes I see you could just use the no-argument `Stack` constructor and then `Push` the `this` item onto the stack. – Joshua Honig Oct 18 '13 at 15:05
I've marked yours as the answer, but see my interpretation that uses a one-way linked list to maintain the original order of the descendants. – Joshua Honig Oct 18 '13 at 16:31
1

@JoshuaHonig If the order is important you can just add in a `Reverse` call, as I said in my edit. Doing so will almost certainly perform better than your code while also being shorter and clearer. – Servy Oct 18 '13 at 16:40
Didn't see that edit until you pointed it out. However I am skeptical that adding another layer of state machine (a ``d_99`1`` to be precise) via `Reverse` would outperform a simple link. That said, I'm not up for performance testing it right now :\- – Joshua Honig Oct 18 '13 at 17:00

score 1 · Answer 2 · answered Oct 18 '13 at 14:19

I think this is a good question. The best explanation I have for why you need two loops: We need to recognize the fact that each item is converted to become multiple items (itself, and all its descendants). This means that we do not map one-to-one (like Select) but one-to-many (SelectMany).

We could write it like this:

public static IEnumerable<Foo> Descendants(this IEnumerable<Foo> items) {
 foreach (var item in items) {
  yield return item;
  foreach (var subitem in item.Children().Descendants())
   yield return subitem;
 }
}

Or like this:

public static IEnumerable<Foo> Descendants(Foo root) {
 var children = root.Children();
 var subchildren = children.SelectMany(c => c.Descendants());
 return children.Concat(subchildren);
}

Or like this:

public static IEnumerable<Foo> Descendants(this IEnumerable<Foo> items) {
 var children = items.SelectMany(c => c.Descendants());
 return items.Concat(children);
}

The versions taking an IEnumerable<Foo> must be invoked on root.Children().

I think all of these rewrites expose a different way of looking at the problem. On the other hand, they all have two nested loops. The loops can be hidden in helper functions but they still exist.

score 1 · Answer 3 · answered Oct 18 '13 at 14:28

1

I would manage this with a List:

public static IEnumerable<Foo> Descendants(this Foo root) {
    List<Foo> todo = new List<Foo>();
    todo.AddRange(root.Children());
    while(todo.Count > 0)
    {
        var first = todo[0];
        todo.RemoveAt(0);
        todo.InsertRange(0,first.Children());
        yield return first;
    }
}

Not recursive, so shouldn't blow the stack. You just always add more work for yourself onto the front of the list and so you achieve the depth-first traversal.

answered Oct 18 '13 at 14:28

Damien_The_Unbeliever

234,701
27
340
448

A very clever algorithm to achieve recursive enumeration without a recursive function! I think this may actually be a good place to use a linked list rather than a list, however. Internally, `List` keeps all its items in an array. All the `RemoveAt` and `InsertRange` calls will cause the all elements in the `List` to be shifted forwards and backwards -- a theoretically expensive operation when your algorithm never requires fast integer-indexed access to the list's items. – Joshua Honig Oct 18 '13 at 14:39
Still has two nested loops because the *Range functions contain a loop. Also it has quadratic complexity because of the list operations at index zero. That could be fixed, though. – usr Oct 18 '13 at 14:43
@usr That's true, but especially if it was modified to explicitly take advantage of a LinkedList, it is a very interesting alternative that avoids maintaining a stack. Behind the scenes the closures and state machine classes that the compiler generates to actually implement `IEnumerable` are not trivial. – Joshua Honig Oct 18 '13 at 14:46
I was trying to re-write it using `LinkedList` with `First` and `RemoveFirst` - but you have to implement the `AddRange`/`InsertRange` methods yourself. – Damien_The_Unbeliever Oct 18 '13 at 14:48
This is close to a good idea, but removing items from the head of a `List` is highly inefficient. What you really need is a `Stack` here. It gives the exact implementation desired, in a much more efficient manor. – Servy Oct 18 '13 at 14:49
@JoshuaHonig Yes, a LL would be a bit better here, but even those aren't particularly efficient in this context. A `Stack` is actually the perfect data structure for this task. – Servy Oct 18 '13 at 14:49
@Servy - what about the `LinkedList` idea? - I like the appeal of using the list just because you can switch from depth-first to breadth-first traversal just by changing which end you add the child items to. – Damien_The_Unbeliever Oct 18 '13 at 14:50
@Damien_The_Unbeliever See my answer. To get breath first just use a `Queue` (implemented as a circular array). Actually, since this is removing from the start of the list, it's actually breath first, not depth first, already. Removing from the end of the list would be depth first (and it would be as efficient as a stack, just not quite as pretty in how it looks). – Servy Oct 18 '13 at 14:53

score 0 · Answer 4 · answered Oct 18 '13 at 16:27

Both Damien_the_Unbeliever and Servy have presented versions of an algorithm that avoid creating a recursive call stack by using collections of one type or another. Damien's use of a List could cause poor performance to inserts at the head of the list, while Servy's use a of stack will cause nested elements to be returned in reverse order. I believe manually implementing a one-way linked list will maintain Servy's performance while still returning all the items in the original order. The only tricky part is initializing the first ForwardLinks by iterating the root. To keep Traverse clean I moved that to a constructor on ForwardLink.

public static IEnumerable<T> Traverse<T>(
    this T root, 
    Func<T, IEnumerable<T>> childrenSelector) {

    var head = new ForwardLink<T>(childrenSelector(root));

    if (head.Value == null) yield break; // No items from root iterator

    while (head != null)
    {
        var headValue = head.Value;
        var localTail = head;
        var second = head.Next;

        // Insert new elements immediately behind head.
        foreach (var child in childrenSelector(headValue))
            localTail = localTail.Append(child);

        // Splice on the old tail, if there was one
        if (second != null) localTail.Next = second;

        // Pop the head
        yield return headValue;
        head = head.Next; 
    }
}

public class ForwardLink<T> {
    public T Value { get; private set; }
    public ForwardLink<T> Next { get; set; }

    public ForwardLink(T value) { Value = value; }

    public ForwardLink(IEnumerable<T> values) { 
        bool firstElement = true;
        ForwardLink<T> tail = null;
        foreach (T item in values)
        {
            if (firstElement)
            {
                Value = item;
                firstElement = false;
                tail = this;
            }
            else
            {
                tail = tail.Append(item);
            }
        }
    }

    public ForwardLink<T> Append(T value) {
        return Next = new ForwardLink<T>(value);
    } 
}

If you want to do a left to right rather than a right to left traverse you can just stick a `Reverse` call after getting the children; it's that easy. Linked lists, in general, are going to perform quite a bit poorer than an array based data structure to the increased pressure on the GC, the loss of memory locality, the increased memory overhead with all of the additional object headers, etc. The code is also harder to read; it's not so verifiably correct. — Servy, Oct 18 '13 at 16:33
Also, `Append` is really a misnomer here. It *should* be inserting the item to be right after the next element, instead it's overwriting the next element with itself, which isn't appending it. — Servy, Oct 18 '13 at 16:38
@Servy It is not "overwriting" `Next`. `Next` is `null` until you have `Append`ed a value. This just follows the same useful convention as, for example, `XmlElement.AppendChild()` which returns a reference to the thing just appended. In this case it is returning a reference to the `Link` that wraps the thing appended. — Joshua Honig, Oct 20 '13 at 00:09
@Servy Excellent points about the *reasons* why a linked list would be less optimal! SO is at its best when it helps people really learn. — Joshua Honig, Oct 20 '13 at 00:14
Append doesn't *need* to be called when `next` is `null`. The first time you're calling it it's setting the next value. The second time it's overriting the previously existing value. Now in your particular use of it you make sure that doesn't happen, but that's the point, you need to make sure not to do that, and that's because `Append` doesn't actually append the node, it replace the existing next node, if any, with a new node. — Servy, Oct 20 '13 at 16:52

score 0 · Answer 5 · answered Nov 23 '16 at 02:08

I propose a different version, without using yield:

    public abstract class RecursiveEnumerator : IEnumerator {
        public RecursiveEnumerator(ICollection collection) {
            this.collection = collection;
            this.enumerator = collection.GetEnumerator();
        }

        protected abstract ICollection GetChildCollection(object item);

        public bool MoveNext() {
            if (enumerator.Current != null) {
                ICollection child_collection = GetChildCollection(enumerator.Current);
                if (child_collection != null && child_collection.Count > 0) {
                    stack.Push(enumerator);
                    enumerator = child_collection.GetEnumerator();
                }
            }
            while (!enumerator.MoveNext()) {
                if (stack.Count == 0) return false;
                enumerator = stack.Pop();
            }
            return true;
        }

        public virtual void Dispose() { }

        public object Current { get { return enumerator.Current; } }

        public void Reset() {
            stack.Clear();
            enumerator = collection.GetEnumerator();
        }

        private IEnumerator enumerator;
        private Stack<IEnumerator> stack = new Stack<IEnumerator>();
        private ICollection collection;
    }

Usage example

    public class RecursiveControlEnumerator : RecursiveEnumerator, IEnumerator {
        public RecursiveControlEnumerator(Control.ControlCollection controlCollection)
            : base(controlCollection) { }

        protected override ICollection GetChildCollection(object c) {
            return (c as Control).Controls;
        }
    }

score -1 · Answer 6 · answered Oct 18 '13 at 14:00

To expand on my comment, this should work:

public static IEnumerable<Foo> Descendants(this Foo node)
{
    yield return node; // return branch nodes
    foreach (var child in node.Children())
        foreach (var c2 in child.Descendants())
            yield return c2; // return leaf nodes
}

That should will return all branch nodes and leaf nodes. If you only want to return leaf nodes, remove the first yield return.

In response to your question, yes it is an algorithmic primitive, because you definitely need to call node.Children(), and you definitely need to call child.Descendants() on each child. I agree that it seems odd having two "foreach" loops, but the second one is actually just continuing the overall enumeration, not iterating the children.

Very close, but this ends up yield-returning the root node itself. Although I am descendant of my grand-father, my grand-father is not a `Descendant` of himself. Also, although you have removed some curly braces, you have not reduced the number of steps in the algorithm. — Joshua Honig, Oct 18 '13 at 14:13

abto · Answer 7 · 2013-10-18T15:08:18.250

-1

Try this:

private static IEnumerable<T> Descendants<T>(
    this IEnumerable<T> children, Func<T, IEnumerable<T>> enumerator)
{
    Func<T, IEnumerable<T>> getDescendants =
        child => enumerator(child).Descendants(enumerator);

    Func<T, IEnumerable<T>> getChildWithDescendants =
        child => new[] { child }.Concat(getDescendants(child));

    return children.SelectMany(getChildWithDescendants);
}

Or if you prefer the non Linq variant:

private static IEnumerable<T> Descendants<T>(
    this IEnumerable<T> children, Func<T, IEnumerable<T>> enumerator)
{
    foreach (var child in children)
    {
        yield return child;

        var descendants = enumerator(child).Descendants(enumerator);

        foreach (var descendant in descendants)
        {
            yield return descendant;
        }
    }
}

And call it like:

root.Children().Descendants(f => f.Children())

edited Oct 18 '13 at 15:08

answered Oct 18 '13 at 14:35

abto

1,583
1
12
31

That only goes to a depth of two, not through the entire tree. – Servy Oct 18 '13 at 14:56
Now it's almost exactly the OP's code (just written with extra local variables), and has all of it's problems. – Servy Oct 18 '13 at 15:09
As you can see I eliminated the call to Foo.Children() and the smell for adding an extension method to object. So this solution is both free of dependecies to a specific class (as achieved in the original post) **and** is not poluting any object with the extension method. – abto Oct 18 '13 at 15:15
That's still not fixing the problem with the code that the OP is looking for, it's just a style change. One that I made myself in my answer as well, but that change alone doesn't address the fact that recursive iterator blocks (which is what you're doing) are highly inefficient. – Servy Oct 18 '13 at 15:18
Okay, I missed the point that reducing recursiveness was a goal to achive. In this case your answer makes the most sense. – abto Oct 18 '13 at 15:37

Logical reduction of recursive enumeration

7 Answers7