Convert Recursive Method Group in LINQ Select to iterative method

Question

I've got a class that looks like this:

public class SourceObject
{
    public string Id { get; set; }
    public List<SourceObject> Children { get; set; }

    public SourceObject()
    {
        Children = new List<SourceObject>();
    }
}

As you can see, it has a property that contains a list of further instances of this same class. The data I'm dealing with for this class means that the number of children is unknown until runtime and the overall "depth" of the resulting object graph is also unknown.

I need to create a "mapping" from the object graph of SourceObject to a similarly shaped graph of DestinationObject's (similar to how AutoMapper might map from one object to another).

I've got a method that will map from my Source graph to my Destination graph, however, this method uses recursion:

// Recursive way of mapping each Source object to Destination
public static DestinationObject MapSourceToDestination(SourceObject source)
{
    var result = new DestinationObject();
    result.Id = source.Id;
    result.Children = source.Children.Select(MapSourceToDestination).ToList();
    return result;
}

This works fine when the size of the source object graph isn't too large or deep, however, when the source object graph is very large, this method will throw a StackOverflow exception.

I have managed to create an alternative version of this function that removes the recursion and replaces it with a Queue/Stack using a technique similar to that described in this answer) however, I've noticed that the Queue/Stack can also grow very large and I'm not sure that my implementation is the most efficient.

Is it possible to convert the recursive function to one that purely uses iteration over the source object graph (i.e. removing recursion and ideally, the use of a Queue/Stack)?

Yes, you can, but you have to rescan the whole thing multiple times... If for example you have 1, 2, 3 and 1 depends on 2 that depends on 3, you could do: 1 (has unresolved dependancies, skip for now), 2 (has unresolved dependancies, skip for now), 3 (resolve), then return to 1 (has still unresolved dependancies, skip for now), 2 (resolve), return to 1 (resolve) — xanatos, Mar 11 '17 at 16:44
Something like [this](http://stackoverflow.com/questions/42425817/iterating-all-items-in-nested-list-and-returning-new-list-of-different-type/42426580#42426580)? — Ivan Stoev, Mar 11 '17 at 16:45
@IvanStoev That uses a *Stack<>*, so it is against the *ideally, the use of a Queue/Stack* — xanatos, Mar 11 '17 at 16:50
@Evk I know (although he said *ideally*). But that solution is different from Eric's implementation and the stack size is purely the depth of the tree, and IMO is the optimal. Good luck to OP with the pure iterative solution though :) — Ivan Stoev, Mar 11 '17 at 16:51
@IvanStoev Multi-iterative solution IS possible... As I've written... You rescan the whole solution multiple times... — xanatos, Mar 11 '17 at 17:24
@bud I'll add (as a sidenote) that in the worst case the *Stack<>* has Count = n - 1 where n is the number of items... So if you are keeping your items in memory (in a *List<>* for example), you aren't wasting too much memory (you are at most doubling the size of the memory used by the *List<>*). A 1,000,000 big stack should be 8mb of memory (at 64 bits)... this to give some numbers (technically it could be 16mb of memory, because *Stack<>* probably grows by doubling, so it could have a capacity of 2,000,000 elements) — xanatos, Mar 11 '17 at 17:25
@xanatos First, you are assuming you have an access to the whole set (like db table) including all the parents and children, which I don't see. Second, even if you do have an access, you'll solve the stack overflow and memory issue, but will ran in time issue. And finally, the solution I'm proposing is using `Stack>` and in the worst case will have 2 * depth elements - note the **depth**, not **N**. — Ivan Stoev, Mar 11 '17 at 17:33
@IvanStoev I know that I'm exchanging a O(N) (worst case) in space (the Stack based you suggested, that is a *Stack* with a O(N^2) in time (rescanning multiple times the list). Note that both solutions will even require a O(N) space for the items that have been done/are postponed (or at least for a flag that says "done") — xanatos, Mar 11 '17 at 17:56
@xanatos No doubt you know :) Just can't understand why you insist on N for stack solution. Either you are missing the `IEnumerator` part, or I don't know. It's true that N is the worst case (the whole set consists of single parent having single child having single child etc.), but in the average case max depth (or height) of the tree is order of magnitude less than N. Anyway, thanks for the discussion, appreciate. — Ivan Stoev, Mar 11 '17 at 18:10

Ivan Stoev · Answer 1 · 2017-03-12T18:48:40.233

I still believe the stack with size the max depth of the tree is the optimal general solution.

But interestingly, the data structure and the concrete process contain all the necessary information to implement the conversion without explicit stack just based on Children.Count. Let see what we need:

(1) Are there more source children to process: source.Children.Count != target.Children.Count)

(2) Which is the next source child to process: source.Children[target.Children.Count]

(3) What is the current processing child index: target.Children.Count - 1

Note that the above rules apply for any level during the processing.

Here is the implementation:

public static DestinationObject MapSourceToDestination(SourceObject source)
{
    // Map everything except childen
    Func<SourceObject, DestinationObject> primaryMap = s => new DestinationObject
    {
        Id = s.Id,
        // ...
        Children = new List<DestinationObject>(s.Children.Count) // Empty list with specified capacity
    };

    var target = primaryMap(source);

    var currentSource = source;
    var currentTarget = target;
    int depth = 0;
    while (true)
    {
        if (currentTarget.Children.Count != currentSource.Children.Count)
        {
            // Process next child
            var sourceChild = currentSource.Children[currentTarget.Children.Count];
            var targetChild = primaryMap(sourceChild);
            currentTarget.Children.Add(targetChild);
            if (sourceChild.Children.Count > 0)
            {
                // Move one level down
                currentSource = sourceChild;
                currentTarget = targetChild;
                depth++;
            }
        }
        else
        {
            // Move one level up
            if (depth == 0) break;
            depth--;
            currentSource = source;
            currentTarget = target;
            for (int i = 0; i < depth; i++)
            {
                int index = currentTarget.Children.Count - 1;
                currentSource = currentSource.Children[index];
                currentTarget = currentTarget.Children[index];
            }
        }
    }

    return target;
}

The only tricky (and partially inefficient) part is the moving up step (which is why the general solution requires stack). If the objects had Parent property, it would have been simply:

currentSource = currentSource.Parent;
currentTarget = currentTarget.Parent;

With the lack of such properties, to find the parents of the current source and target items, we start from root items and move down through currently processing index (see (3)) until we hit the desired depth.

score 1 · Answer 2 · edited May 23 '17 at 10:29

I don't think that a function that purely uses iteration is better in itself, but I'd implement it with a couple of extensions

public static SourceObject GetAtList(this SourceObject s, List<int> cycleRef)
{
    var ret = s;
    for (int i = 0; i < cycleRef.Count; i++)
    {
        ret = ret.Children[cycleRef[i]];
    }
    return ret;
}
public static void SetAtList(this DestinationObject d, List<int> cycleRef, SourceObject s)
{
    var ret = d;
    for (int i = 0; i < cycleRef.Count - 1; i++)
    {
        ret = ret.Children[cycleRef[i]];
    }
    ret.Children.Add ( new DestinationObject() { Id = s.Id } );
}

and lists of iterators

public static DestinationObject MapSourceToDestinationIter(SourceObject source)
{
    var result = new DestinationObject();
    result.Id = source.Id;
    if (source.Children.Count == 0)
    {
        return result;
    }
    List<int> cycleTot = new List<int>();
    List<int> cycleRef = new List<int>();
    cycleRef.Add(0);
    cycleTot.Add(source.Children.Count-1);
    do
    {
        var curr = source.GetAtList(cycleRef);
        result.SetAtList(cycleRef, curr);
        if (curr.Children.Count == 0)
        {
            cycleRef[cycleRef.Count - 1]++;
            while (cycleRef[cycleRef.Count - 1]> cycleTot[cycleTot.Count-1])
            {
                cycleRef.RemoveAt(cycleRef.Count - 1);
                cycleTot.RemoveAt(cycleTot.Count - 1);
                if (cycleRef.Count == 0)
                {
                    break;
                }
                cycleRef[cycleRef.Count - 1]++;
            } 
        } else
        {
            cycleRef.Add(0);
            cycleTot.Add(curr.Children.Count - 1);
        }
    } while (cycleTot.Count>0);
    return result;
}

I wouldn't necessarily suggest to go on that way, but it's maybe faster than the Linq alternative...

Anyway, an explicitly usage of a Stack (like in the answer of Ivan Stoev) would be the optimal solution.

Dolev · Answer 3 · 2017-03-11T21:37:36.490

Is it possible to convert the recursive function to one that purely uses iteration over the source object graph (i.e. removing recursion and ideally, the use of a Queue/Stack)?

Replacing a recursive call of LINQ.Select with a stack is achievable with a stack and a queue. I used a Tuple to remember the id of the parent node.

Run time - o(n). Space complexity - o(number of nodes in a level).

We can change the space complexity if we just use a queue - o(min(h*d, n)). h for height, b for maximal number of children in a node. Consider this code:

public DestinationObject MapSourceToDestination(SourceObject root)
{
    Stack<Tuple<DestinationObject,int>> stack = new Stack<Tuple<DestinationObject,int>>();

    DestinationObject currentChild = new DestinationObject();
    currentChild.Id = root.Id;
    stack.Push(new Tuple<DestinationObject,int>(currentChild,root.Id));

    while(stack.Count > 0)
    {
        Tuple<DestinationObject,int> currentTuple = stack.Pop();

        current = currentTuple[0];

        children = current.Children;

        foreach (SourceObject sourceChild in root.Children)
        {
            currentChild = new DestinationObject();
            currentChild.Id = currentTuple[1];
            Children.Add(currentChild);
            stack.Push(new Tuple<DestinationObject,int>(currentChild,sourceChild.Id));
        }
    }
}

score 0 · Answer 4 · answered Mar 11 '17 at 21:35

Should I dare to say... You have conflicting requirements and therefore your problem is in requirements/design, not in code? For two points you have mentioned in the question:

You say that he number of children of a SourceObject is unknown until the run-time. In that case, the possibility of a stack overflow is unavoidable. That is what happens when the size of data is unknown and at run-time it turns out to be larger than the space available on the machine.
Further, irrespective of your liking, a Stack or a Queue is the correct data structure for this kind of processing, if you want to avoid recursion. You have to either do recursion, or have to store your SourceObjects in some data structure to keep track of which one to visit as you continue the processing.

I would go with Stack/Queue method over recursion for graph exploration or graph traversal and stay aware of the fact that if the graph is big enough then my Stack/Queue will consume all the system memory, and cause an overflow.

To avoid that, either increase the memory on your machine (i.e. scale up) or increase the number of machines that do the work for you while parallelizing your algorithm at the same time (i.e. scale out).

Convert Recursive Method Group in LINQ Select to iterative method

4 Answers4