So, i am trying to create a decision tree out of some data(Pandas DataFrame). The data has some attributes and a label. here is an example:
Outlook Temp Humidity Wind label
Sunny Hot High Weak No
Sunny Hot High Strong No
Overcast Hot High Weak Yes
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Overcast Cool Normal Strong Yes
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Overcast Mild High Strong Yes
Overcast Hot Normal Weak Yes
Rain Mild High Strong No
Now here is my code.
def createTree(dataF):
posibleLabels = (set(dataF['label'].to_list()))
if len(posibleLabels) == 1:
return decisionTree(posibleLabels.pop(), []) #This is a leaf node
else:
root = bestGain(dataF) #This function determines the root of the sub tree(For testing, return
posibleVals = set(dataF[root].to_list()) # any of the column names except "label")
newTree = decisionTree(root, posibleVals)
for v in posibleVals:
subDataF = dataF.loc[dataF[root] == v]
subTree = createTree(subDataF) #This is the problematic Recursive Call
newTree.append(subTree, v)
return newTree
For the small example i provided above, this code works perfectly. But when i try a bigger dataframe, it reaches max recursion depth. Does anyone have an idea to make this tail recursive or maybe make it iterative?
PD. Here is the decisionTree class:
class decisionTree:
def __init__(self, root, posibleVals):
self.root = root
self.posibleVals = posibleVals
self.childs = {}
for v in posibleVals:
self.childs.update({v: None})
def append(self, subTree, val):
self.childs.update({val: subTree})
def __str__(self):
s = str(self.root) + "( "
s1 = ""
for k, v in self.childs.items():
s1 = s1 + ", " + k
s = s + s1.lstrip(", ") + " )\n"
for k, v in self.childs.items():
s = s + k + "->" + str(v)
return s