I am running agglomerative clustering on a data set of 130K rows (130K unique keys) and 7 columns, each column ranging from 20 to 2000 unique levels. The data are categorical, specifically alphanumeric codes. At most they can be thought of as factors. I am experimenting with what results I might get from a couple of alternatives to k-modes, including hierarchical clustering and MCA.
My question is, is there any good way to visualize the results up to a certain level with the tree structure?
Standard steps are not a problem:
library{cluster}
Compute Gower distance,
ptm <- proc.time() gower.dist <- daisy(df[,colnams], metric = c("gower")) elapsed <- proc.time() - ptm c(elapsed[3],elapsed[3]/60)
Compute agglomerative clustering object from Gower distance
aggl.clust.c <- hclust(gower.dist, method = "complete")
Now to plotting it. The following line works, but the plot is humanly unreadable
plot(aggl.clust.c, main = "Agglomerative, complete linkages")
Ideally what I am looking for would be something like so (the below is pseudocode that failed on my system)
plot(cutree(aggl.clust.c, k=7), main = "Agglomerative, complete linkages")
I am running R version 3.2.3. That version cannot change (and I don't believe it ought to make a difference for what I am trying to do).
I'd be interested in doing the same in Python, if anyone has good pointers.