0

I'm probably just trying to do something crazy here, so let me explain my use case first:

I've got an object graph in Ruby, consisting only of the basic JSON types (strings, numbers, arrays, hashes, trues, falses, nils). I'd like ultimately to serialize this graph to JSON.

The problem is that I don't have control over the origin of all objects in the graph. This means that some of the strings contained in the object graph might be tagged with the wrong encodings (for example, a string that's actually just a bunch of random garbage bytes ends up tagged with a UTF-8 encoding). This will cause the JSON serialization to fail (since JSON only supports UTF-8 encoded strings).

I have a strategy for handling these problematic strings, which basically consists of replacing them with a transformed version of each string (the exact transformation isn't really relevant).

In order to apply this transformation to strings, I need to walk the entire object graph and find all of them. This is trivial to implement recursively using standard depth-first search. One wrinkle is that I'd like to avoid mutating the original object graph or any strings therein, so I'm basically building a copy of the object graph as I traverse it (with only the non-problematic leaf nodes being referenced directly from the new graph, and all other nodes being duped).

This all works, and is reasonably efficient, save the duplication of non-leaf nodes in the transformed object graph. The problem is that it sometimes gets fed very deeply-nested object graphs, so the recursion will on occasion produce a SystemStackError.

I've implemented a non-recursive solution using DFS with a stack of Enumerator objects, but it seems to be dramatically slower than the recursive solution (presumably on account of the extra object allocations for the Enumerators and the silly StopIteration exceptions that get raised at the end of each Enumerator.

Breadth-first search seems inappropriate, because I don't think there's a way to determine the path back up to the root when visiting a given node, which I think I need in order to build a copy of the tree.

Am I wrong about BFS here? Are there other techniques that I could be using to accomplish this traversal without recursion? Is this all just loony?

grumbler
  • 886
  • 9
  • 20
  • Why do you need to traverse the stack to make a copy? To copy a graph, keep a hash mapping original nodes to copied nodes. Search the original. Check the destination of each edge to see if it's in the hash. If it's not, copy the destination, put an original->copy pair in the hash, and add the new copy to the search queue/stack. If it is already in the hash, just set the destination pointer to the copy. With this algorithm, the search you use doesn't matter. A stack of Enumerators is crazy. Keep a stack of fully unexplored children, not partially explored parents! – Gene Aug 09 '14 at 01:53
  • How deeply nested does a JSON object need to be to give you a `SystemStackError`? I'm impressed. – Max Aug 09 '14 at 11:56

1 Answers1

0

Instead of using recursion you could use a stack explicitly see here for more details:

Way to go from recursion to iteration

http://haacked.com/archive/2007/03/04/Replacing_Recursion_With_a_Stack.aspx/

Community
  • 1
  • 1
ShaneQful
  • 2,140
  • 1
  • 16
  • 22