12

I've been getting RuntimeError: maximum recursion depth exceeded when trying to pickle a highly-recursive tree object. Much like this asker here.

He solved his problem by setting the recursion limit higher with sys.setrecursionlimit. But I don't want to do that: I think that's more of a workaround than a solution. Because I want to be able to pickle my trees even if they have 10,000 nodes in them. (It currently fails at around 200.)

(Also, every platform's true recursion limit is different, and I would really like to avoid opening this can of worms.)

Is there any way to solve this at the fundamental level? If only the pickle module would pickle using a loop instead of recursion, I wouldn't have had this problem. Maybe someone has an idea how I can cause something like this to happen, without rewriting the pickle module?

Any other idea how I can solve this problem will be appreciated.

Community
  • 1
  • 1
Ram Rachum
  • 84,019
  • 84
  • 236
  • 374

4 Answers4

3

I suppose that most people never use recursive structures of such depth. Since easiest serialization implementations are recursive, you're going to only see them.

If I were you I'd not use an openly recursive data structure here. Instead I'd number every node and use a link table that efficiently translates a number to a node with that number. Every node would refer to other nodes (e.g. its children) via that table, using numbers. A simple property would make this syntactically easy. Beyond that properties, no code dealing with tree traversal would have to change. Node constructor will have to allocate a number and put itself into the link table, which is trivial, too.

The link table might be just a list of nodes, where index into the list serves as the node number; Python lists seem to have efficient access by index. If speed of inserts is important, I'd preallocate a long enough list filled with None; it would not take too much space. If nodes stored their own numbers, this structure would be cheaply traversable in both directions.

As you see, pickling and unpickling such a tree would be trivial at any depth.

9000
  • 39,899
  • 9
  • 66
  • 104
  • 3
    So you're saying, avoid having pointers from a node to its children and parent. It would indeed solve the problem, but it will be really annoying not to have pointers. That's compromising on the program's data architecture just because of `pickle`'s problematic implementation. – Ram Rachum Jun 05 '10 at 10:40
  • 2
    Not exactly. This approach will have the same _interface_ as if the pointers were simple python references. It's a matter of a simple property definition, with 'get' operation being fairly efficient. – 9000 Jun 15 '10 at 23:27
  • 1
    @9000 I know it has been 10 years but I tried this method and it significantly slows down the program. – Kaleba KB Keitshokile Mar 17 '21 at 01:52
  • @KalebaKBKeitshokile: Things change, and there's no substitute to measuring real performance. This approach is not about speeding things up, but about tackling excessively deep structures, so deep that Python stack overflows in recursive code. Feel free to offer a better solution what tackles a very deep, or, better yet, cyclical structure: I'd gladly upvote it! – 9000 Mar 31 '21 at 20:00
  • 1
    @9000 I ended up opting to save the nodes and in one array and their neighbours in another (not connected yet) and then only connecting them when loading – Kaleba KB Keitshokile Apr 03 '21 at 01:21
2

To make understanding easier, here's a complete example, with only one link to simplify it:

class Node(object):
  linker = [] # one list for all Node instances
  def __init__(self, payload):
    self.payload = payload
    self.__next = None
    self.__index = len(self.linker)
    self.linker.append(self)
  #
  def getNext(self):
    if self.__next is not None:
      return self.linker[self.__next]
  #
  def setNext(self, another):
    if another is not None:
      self.__next = another.__index
    else:
      self.__next = None
  #
  next = property(getNext, setNext)
  #
  def __str__(self):
    return repr(self.payload)


a = Node("One")
b = Node("Two")
c = Node("Three")

b.next = c
a.next = b

# prints "One" "Two" "Three"
print a, a.next, a.next.next

Also note that this structure can easily contain cycles and still serialize plainly.

9000
  • 39,899
  • 9
  • 66
  • 104
1

Just dont use recursion. Make a stack (list / queue) with open nodes and process this.

Something like this (pseudo code)

stack.add(root)
while not list.empty:
    current = stack.pop
    // process current
    for each child of current:
        stack.add(child)

That should do it

Mene
  • 3,739
  • 21
  • 40
1

I think a good solution is a combination of Mene's and 9000's answers. Given that nodes have globally unique IDs (maybe somehow memory addresses can be used as such) you can do this. Granted this is a sloppy pseudo-implementation, but with a bit of abstraction if encapsulated in a tree class it could be very simple.

def all_nodes(node): # walk the tree and get return all nodes as a list
    if node:
        nodes = []
        for child in node.children:
            for sub_child in all_nodes(child):
                nodes.append(sub_child)
        return nodes
    return []


class Node(object):
    def __init__(self, children, id):
        self.children = children
        self.id = id

    def __getstate__(self): #when pickling translate children into IDs
        tmp = self.__dict__.copy()
        children_ids = []
        for child in tmp['children']:
            children_ids.append(child.id)
        tmp['children_ids'] = children_ids
        return tmp


lookup = dict()


for node in all_nodes(rootNode): # put all nodes into a dictionary
    lookup[node.id] = node
#then pickle the dictionary
#then you can unpickle it and walk the dictionary
for id, node in lookup:
    del node.children
    node.children = []
    for child in node.children_ids:
        node.children.append(lookup[child])
#and three should now be rebuilt
Novikov
  • 4,399
  • 3
  • 28
  • 36