0

I just spent hours banging my head about the following problem in python, and I can't quite figure out why it happened.

Say I'm making a decision tree class:

class DecisionTree:

    def __init__(self, name, children = dict()):
        self.name = name 
        self.children = children # <------- here's the problem

    def add_child(self, child_name, child_tree):
        self.children[child_name] = child_tree

    def pprint(self, depth = 0):
            for childkey in self.children:
                tabs = " ".join(["\t" for x in xrange(depth)])
                print tabs + 'if ' + self.name + ' is ' + str(childkey)
                if (self.children[childkey].__class__.__name__ == 'DecisionTree'): # this is a tree, delve deeper
                    print tabs + 'and...'
                    child = self.children.get(childkey)
                    child.pprint(depth + 1 )
                else:
                    val = self.children.get(childkey)
                    print tabs + 'then predict ' + str( val )
                    print "\n"
            return ''

Now let's build a nonsense tree, and try to print it:

def make_a_tree(depth = 0, counter = 0):
    counter += 1

    if depth > 3:
        return 'bottom'
    else:

        tree = DecisionTree(str(depth)+str(counter))

        for i in range(2):
            subtree = make_a_tree(depth+1, counter)
            tree.add_child(i, subtree)
        return tree

foo = make_a_tree()
foo.pprint()

This code leads to an infinite recursion loop, because the tree structure was (somehow) mistakenly built with the 2nd node of the tree referring to itself.

If I change the line I've marked above (5th one) to tree.children = dict(), then things work properly.

I can't wrap my head around what's happening here. The intention behind the code as written is to take an argument for "children", and if none is passed, create an empty dictionary and use that as children.

I'm pretty new to Python, and I'm trying to make this a learning experience. Any help would be appreciated.

jwdink
  • 4,824
  • 5
  • 18
  • 20
  • 1
    See this answer here: http://stackoverflow.com/questions/5587582/numpy-array-subclass-unexpedly-shares-attributes-across-instances/5592432#5592432 – Joe Kennedy Apr 28 '15 at 19:24
  • Aside: `child = self.children.get(childkey)` / `child.pprint(depth + 1 )` looks weird. `.get` is a dictionary method which returns a default value (None, if not otherwise specified) if the key isn't there. But if the key isn't there, `None.pprint` won't work; and since you're looping over the keys of `self.children`, I think it has to be there. So `self.children[childkey]` should suffice, or you could iterate over the pairs given by `self.children.items()` instead. – DSM Apr 28 '15 at 19:37
  • Thanks— that was just a vestige from when I was desperately debugging, looking for any solution to my problem (I thought for a second that my problem was in the print function… no idea why I thought using get instead of brackets would help). – jwdink Apr 28 '15 at 19:46

2 Answers2

1

The problem is that default arguments for functions (and __init__() is not exempt) are created once. In other words, you are reusing the same dict object every time you create a new DecisionTree.

What you need to do is something like:

class DecisionTree:

def __init__(self, name, children = None):
    self.name = name
    if children is None:
        children = {}
    self.children = children
malfunctioning
  • 435
  • 1
  • 3
  • 10
0

As a solution to your problem i suggest to initialize the default argument children to None, and have a line like: self.children = children or dict()

The problem is that in Python, function arguments are passed by reference, and default arguments which have mutable values (and dictionaries are mutable) are evaluated at function definition time (only once) and not at every call of the function resulting that each time your function is called dict() will return the same reference. Common python gotchas.

CristiFati
  • 38,250
  • 9
  • 50
  • 87
  • Both excellent answers. I'm arbitrarily marking the other one as accepted because it came first. – jwdink Apr 28 '15 at 19:34