1

I am trying to extract decision rules to predict terminal nodes and to print code that would use pandas numpy arrays to predict the terminal node numbers. I found a solution that can pull the rules at (How to extract the decision rules from scikit-learn decision-tree?), but I am not sure how to expand it to produce what I need. The link to the solution has a lot of answers. Here is the one I am referring to and description of the question.

import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier

# dummy data:
df = pd.DataFrame({'col1':[0,1,2,3],'col2':[3,4,5,6],'dv':[0,1,0,1]})
df
# create decision tree
dt = DecisionTreeClassifier(random_state=0, max_depth=5, min_samples_leaf=1)
dt.fit(df.loc[:,('col1','col2')], df.dv)

#This function first starts with the nodes (identified by -1 in the child arrays) and then recursively finds the parents. 
#I call this a node's 'lineage'. Along the way, I grab the values I need to create if/then/else SAS logic:

def get_lineage(tree, feature_names):
     left      = tree.tree_.children_left
     right     = tree.tree_.children_right
     threshold = tree.tree_.threshold
     features  = [feature_names[i] for i in tree.tree_.feature]

     # get ids of child nodes
     idx = np.argwhere(left == -1)[:,0]     

     def recurse(left, right, child, lineage=None):          
          if lineage is None:
               lineage = [child]
          if child in left:
               parent = np.where(left == child)[0].item()
               split = 'l'
          else:
               parent = np.where(right == child)[0].item()
               split = 'r'

          lineage.append((parent, split, threshold[parent], features[parent]))

          if parent == 0:
               lineage.reverse()
               return lineage
          else:
               return recurse(left, right, parent, lineage)

     for child in idx:
          for node in recurse(left, right, child):
               print (node)

get_lineage(dt, df.columns)

when you run the code, it will provide this:

(0, 'l', 3.5, 'col2')
1
(0, 'r', 3.5, 'col2')
(2, 'l', 1.5, 'col1')
3
(0, 'r', 3.5, 'col2')
(2, 'r', 1.5, 'col1')
(4, 'l', 2.5, 'col1')
5
(0, 'r', 3.5, 'col2')
(2, 'r', 1.5, 'col1')
(4, 'r', 2.5, 'col1')
6

How can I expand it to print something like this:

df['Terminal_Node_Num']=np.where(df.loc[:,'col2']<=3.5,1,0)
df['Terminal_Node_Num']=np.where(((df.loc[:,'col2']>3.5) & (df.loc[:,'col1'] 
<=1.5)), 3, df['Terminal_Node_Num'])
df['Terminal_Node_Num']=np.where(((df.loc[:,'col2']>3.5) & 
(df.loc[:,'col1']>1.5) & (df.loc[:,'col1']<=2.5)), 5, 
df['Terminal_Node_Num'])
df['Terminal_Node_Num']=np.where(((df.loc[:,'col2']>3.5)`enter code here`(df.loc[:,'col1']>1.5) & (df.loc[:,'col1']>2.5)), 6, df['Terminal_Node_Num'])  
Usman
  • 1,983
  • 15
  • 28
Sveta
  • 161
  • 1
  • 2
  • 11

0 Answers0