I have been playing around with sklearn a bit and following some simple examples online using the iris data.
I've now begun to play with some other datas. I'm not sure if this behaviour is correct and I'm misunderstanding but everytime I call fit(x,y) I get completely different tree data. So when I then run predictions I get varying differences (of around 10%), ie 60%, then 70%, then 65% etc...
I ran the code below twice to output 2 trees so I could read them in Word. I tried searching values from one doc in the other and a lot of them I couldn't find. I kind of assumed fit(x, y) would always return the same tree - if this is the case then I assume my train data of floats is punking me.
clf_dt = tree.DecisionTreeClassifier()
clf_dt.fit(x_train, y_train)
with open("output2.dot", "w") as output_file:
tree.export_graphviz(clf_dt, out_file=output_file)