0

So, i am trying to create a decision tree out of some data(Pandas DataFrame). The data has some attributes and a label. here is an example:

Outlook   Temp  Humidity Wind    label
Sunny     Hot   High     Weak    No
Sunny     Hot   High     Strong  No
Overcast  Hot   High     Weak    Yes
Rain      Mild  High     Weak    Yes
Rain      Cool  Normal   Weak    Yes
Rain      Cool  Normal   Strong  No
Overcast  Cool  Normal   Strong  Yes
Sunny     Mild  High     Weak    No
Sunny     Cool  Normal   Weak    Yes
Rain      Mild  Normal   Weak    Yes
Sunny     Mild  Normal   Strong  Yes
Overcast  Mild  High     Strong  Yes
Overcast  Hot   Normal   Weak    Yes
Rain      Mild  High     Strong  No

Now here is my code.

def createTree(dataF):
    posibleLabels = (set(dataF['label'].to_list()))
    if len(posibleLabels) == 1:
        return decisionTree(posibleLabels.pop(), []) #This is a leaf node
    else:
        root = bestGain(dataF) #This function determines the root of the sub tree(For testing, return 
        posibleVals = set(dataF[root].to_list())            # any of the column names except "label")                
        newTree = decisionTree(root, posibleVals)
        for v in posibleVals:
            subDataF = dataF.loc[dataF[root] == v] 
            subTree = createTree(subDataF)  #This is the problematic Recursive Call
            newTree.append(subTree, v)
        return newTree

For the small example i provided above, this code works perfectly. But when i try a bigger dataframe, it reaches max recursion depth. Does anyone have an idea to make this tail recursive or maybe make it iterative?

PD. Here is the decisionTree class:

class decisionTree:

    def __init__(self, root, posibleVals):

        self.root = root
        self.posibleVals = posibleVals
        self.childs = {}
        for v in posibleVals:
            self.childs.update({v: None})

    def append(self, subTree, val):
        self.childs.update({val: subTree})

    def __str__(self):
        s = str(self.root) + "( "
        s1 = ""
        for k, v in self.childs.items():
            s1 = s1 + ", " + k
        s = s + s1.lstrip(", ") + " )\n"
        for k, v in self.childs.items():
            s = s + k + "->" +  str(v)
        return s
carloslockward
  • 335
  • 3
  • 8
  • 2
    Note that Python [doesn't support automatic TCO](https://stackoverflow.com/questions/13591970/does-python-optimize-tail-recursion), so you're stuck with the latter option. – Carcigenicate Apr 12 '20 at 20:32
  • 1
    As an aside, variable and function names should generally follow the `lower_case_with_underscores` style, and class names the `CamelCase` style. Also, you're converting Pandas Series to lists in a few areas when it's unnecessary, like `posibleLabels = (set(dataF['label'].to_list()))`, which can simply be `possible_labels = set(data_f["label"])`. – AMC Apr 12 '20 at 21:43

0 Answers0