In the above picture, the highest level samples is 6499, which are split into 3356 True and 3143 False. But if you follow the True path, it says there are 2644 samples. Why wouldn't there be 3356? All the samples seem to conflict with the results from the levels above.
I think I'm just misunderstanding what samples and value mean, but in case it's the code, here's the code of the graphing part:
dot_data = tree.export_graphviz(clf,
feature_names=columns[1:],
out_file=None,
filled=True,
rounded=True)
graph = pydotplus.graph_from_dot_data(dot_data)
colors = ('green', 'red')
edges = collections.defaultdict(list)
for edge in graph.get_edge_list():
edges[edge.get_source()].append(int(edge.get_destination()))
for edge in edges:
edges[edge].sort()
for i in range(2):
dest = graph.get_node(str(edges[edge][i]))[0]
dest.set_fillcolor(colors[i])
graph.write_png('tree.png')