1

I am running agglomerative clustering on a data set of 130K rows (130K unique keys) and 7 columns, each column ranging from 20 to 2000 unique levels. The data are categorical, specifically alphanumeric codes. At most they can be thought of as factors. I am experimenting with what results I might get from a couple of alternatives to k-modes, including hierarchical clustering and MCA.

My question is, is there any good way to visualize the results up to a certain level with the tree structure?

Standard steps are not a problem:

library{cluster}
  • Compute Gower distance,

    ptm <- proc.time()
    gower.dist <- daisy(df[,colnams], metric = c("gower"))
    elapsed <- proc.time() - ptm
    c(elapsed[3],elapsed[3]/60)
    
  • Compute agglomerative clustering object from Gower distance

    aggl.clust.c <- hclust(gower.dist, method = "complete")
    

Now to plotting it. The following line works, but the plot is humanly unreadable

plot(aggl.clust.c, main = "Agglomerative, complete linkages")

Ideally what I am looking for would be something like so (the below is pseudocode that failed on my system)

plot(cutree(aggl.clust.c, k=7), main = "Agglomerative, complete linkages")

I am running R version 3.2.3. That version cannot change (and I don't believe it ought to make a difference for what I am trying to do).

I'd be interested in doing the same in Python, if anyone has good pointers.

  • @ parfait, library{cluster}. The data are, as stated, 130K rows x 7 columns. That table is too big to put in here, and it is proprietary, and it really has no relevance to answering the question. Cheers. – user3100205 Jul 02 '19 at 19:51
  • 1
    Always `dput` few rows. See [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5965451). And please update your post with all library lines not in comments. Ultimately we need fully compilable, runnable code with data for reproducibility. – Parfait Jul 02 '19 at 19:54
  • The data are proprietary. And not necessary to answer my question. Thanks. – user3100205 Jul 02 '19 at 19:55
  • See link above where you can set up mock data or R's built-in [datasets](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html) (e.g., iris, mtcars) – Parfait Jul 02 '19 at 20:24

1 Answers1

0

I found a useful answer to my question re plotting part of a tree using the as.dendogram() method. Link: http://www.sthda.com/english/wiki/beautiful-dendrogram-visualizations-in-r-5-must-known-methods-unsupervised-machine-learning