4

I am using graphviz to plot the a classification decision tree.

before to fit the Features I use "preprocessing.StandardScaler()" to scale them

therefore when I plot the decision tree I get it plotted based on the "transformed values"

is there a way to "inverse_trasform" the classifier before to plot it so that the decision tree plots the actual values at the nodes and not the transformed ones?

yes, I have tried scale.inverse_transform(rf_clf) .... but of course don' twork...

Import the dataset from sklearn.datasets

iris = datasets.load_iris()

Create a data frame from the dictionary

species = [iris.target_names[x] for x in iris.target]
iris = pd.DataFrame(iris['data'], columns = ['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width'])
iris['Species'] = species

converting to arrays

Features = np.array(iris[['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width']])

levels = {'setosa':0, 'versicolor':1, 'virginica':2}
Labels =  np.array([levels[x] for x in iris['Species']])

splitting

nr.seed(1115)
indx = range(Features.shape[0])
indx = ms.train_test_split(indx, test_size = 100)
X_train = Features[indx[0],:]
y_train = np.ravel(Labels[indx[0]])
X_test = Features[indx[1],:]
y_test = np.ravel(Labels[indx[1]])

scaling:

scale = preprocessing.StandardScaler()
scale.fit(X_train)
X_train = scale.transform(X_train)

fitting the classifier

rf_clf = tree.DecisionTreeClassifier() ###simple TREE
rf_clf.fit(X_train, y_train)*

plotting the decision tree with graphviz:

dot_data = tree.export_graphviz(rf_clf, out_file=None, 

             feature_names=['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width'], 
             class_names=['setosa', 'versicolor', 'virginica'], 

                 filled=True, rounded=True,  
                 special_characters=True)

print(dot_data)

graph = graphviz.Source(dot_data)  
graph 

the results of the first node is "Petal_width<= 0.53" and the second node is "petal lenght <= -0.788" that is a negative figure of a real quantity.

I would prefer to have the tree bearing the real value in Inches...

CRAZYDATA
  • 135
  • 7

1 Answers1

1

You could traverse the tree and set the value of the node threshold yourself.

If you consider this example for traversing the tree: https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html#sphx-glr-auto-examples-tree-plot-unveil-tree-structure-py

Where it says print("%snode=%s test node: go to node %s if X[:, %s] <= %s else to node %s."... You could rewrite the threshold and use the scaler's inverse_transform function for the feature under test.

transformed = np.empty(X_train.shape[1])
transformed[:] = np.nan
transformed[feature[i]] = threshold[i]
threshold[i] = scale.inverse_transform(transformed)[feature[i]]

Your generated dot file will contain the updated values. You won't be able to use the tree for prediction anymore with the scaled features though.

Note: the value of the threshold isn't exactly the same as without scaling, I'm not sure if the scaler should have an influence on the threshold like that.

TomVW
  • 1,510
  • 13
  • 26