I feel like very similar versions of this question have been asked, and I have learned a lot while trying to solve this problem, but there is some (probably very basic) concept that I am missing.
Here are three very similar question/answers that are very good:
Extract path from root to leaf in sklearn's agglomerative clustering
Yield all root-to-leaf branches of a Binary Tree
How do you visualize a ward tree from sklearn.cluster.ward_tree?
And there is some great stuff on Mike Bostock's D3 Git: Mike Bostock's D3 Git Repo
Now the specifics of my situation:
I have done some analysis in Python using sklearn Agglomerative Clustering. I am generating the dendrograms I would like to see in MatplotLib:
T=7 T=7 Dendrogram
T=2 T=2 Dendrogram
Now I would like to add those dendrograms and some other functionality to a web site. I have built a site using Django.
I have used some D3 Javascript functionality already to implement a dynamic and interactive Tree Diagram like this one:
https://bl.ocks.org/d3noob/8375092
And I have made it so it loads the information for each branch from a json file. So it is both dynamic and interactive. Interactive Tree Diagram
Now I want to mimic some of that functionality with the info from the Agglomerative Clustering.
I want to:
Make a dendrogram similar to the one from MatplotLib, and make it interactive such that there should be a slider that allows the user to change the T value, and the dendrogram should redraw.
A. I am open to any suggestions. I can brute force a solution by simply recalculating the dendrogram in Python (as a module inside Django app), saving an image and loading that image on the javascript side in the template. I think there is probably a more elegant solution with D3 but I am running out of time to do research.
Create an interactive Tree Diagram using the info from the clustering. I would like to see the dendrogram far more interactive as a tree. It seems like I should be able to use either the agglo_model.children_ or the linkage_matrix.
agglo_model.children_:
[[ 35 36][ 13 18][ 19 20]...[ 22 69][ 33 34][ 14 32]]
or the linkage_matrix:
linkage_matrix:
[[ 35. 36. 0. 2. ]
[ 22. 69. 1.73205081 4. ]
...
[ 50. 57. 4.47213595 2. ]
[ 9. 41. 4.69041576 2. ]
...
[116. 126. 12.62713128 36. ]
[128. 129. 17.97652791 66. ]]
The key piece I am missing is how to go from scikit to the following tree format for d3.js
var treeData = [
{
"name": "Top Level",
"parent": "null",
"children": [
{
"name": "Level 2: A",
"parent": "Top Level",
"children": [
{
"name": "Son of A",
"parent": "Level 2: A"
},
{
"name": "Daughter of A",
"parent": "Level 2: A"
}
]
},
{
"name": "Level 2: B",
"parent": "Top Level"
}
]
}
];
- Show a clustering diagram on the web page. Basicially I'd like to mimic this page: Agglomerative Clustering and MatplotLib Diagrams and Dendrograms with interactive javascript.
Again - any suggestions appreciated.