I am using graphviz to plot the a classification decision tree.
before to fit the Features I use "preprocessing.StandardScaler()" to scale them
therefore when I plot the decision tree I get it plotted based on the "transformed values"
is there a way to "inverse_trasform" the classifier before to plot it so that the decision tree plots the actual values at the nodes and not the transformed ones?
yes, I have tried scale.inverse_transform(rf_clf) .... but of course don' twork...
Import the dataset from sklearn.datasets
iris = datasets.load_iris()
Create a data frame from the dictionary
species = [iris.target_names[x] for x in iris.target]
iris = pd.DataFrame(iris['data'], columns = ['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width'])
iris['Species'] = species
converting to arrays
Features = np.array(iris[['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width']])
levels = {'setosa':0, 'versicolor':1, 'virginica':2}
Labels = np.array([levels[x] for x in iris['Species']])
splitting
nr.seed(1115)
indx = range(Features.shape[0])
indx = ms.train_test_split(indx, test_size = 100)
X_train = Features[indx[0],:]
y_train = np.ravel(Labels[indx[0]])
X_test = Features[indx[1],:]
y_test = np.ravel(Labels[indx[1]])
scaling:
scale = preprocessing.StandardScaler()
scale.fit(X_train)
X_train = scale.transform(X_train)
fitting the classifier
rf_clf = tree.DecisionTreeClassifier() ###simple TREE
rf_clf.fit(X_train, y_train)*
plotting the decision tree with graphviz:
dot_data = tree.export_graphviz(rf_clf, out_file=None,
feature_names=['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width'],
class_names=['setosa', 'versicolor', 'virginica'],
filled=True, rounded=True,
special_characters=True)
print(dot_data)
graph = graphviz.Source(dot_data)
graph
the results of the first node is "Petal_width<= 0.53" and the second node is "petal lenght <= -0.788" that is a negative figure of a real quantity.
I would prefer to have the tree bearing the real value in Inches...