7

I know dendrograms are quite popular. However if there are quite large number of observations and classes it hard to follow. However sometime I feel that there should be better way to present the same thing. I got an idea but do not know how to implement it.

Consider the following dendrogram.

> data(mtcars)
> plot(hclust(dist(mtcars)))

enter image description here

Can plot it like a scatter plot. In which the distance between two points is plotted with line, while sperate clusters (assumed threshold) are colored and circle size is determined by value of some variable.

enter image description here

PeeHaa
  • 71,436
  • 58
  • 190
  • 262
fprd
  • 621
  • 7
  • 21
  • 3
    the igraph package is what you're after – Tyler Rinker Jul 13 '12 at 01:30
  • 2
    You can use some form of multidimensional scaling (`cmdscale`) to find the coordinates, then draw the tree returned by `hclust`, and use `cut` to determine the node colours. – Vincent Zoonekynd Jul 13 '12 at 02:25
  • 3
    I do not know actual math of it, but just plotting part may be the package qgraph be helpful... https://sites.google.com/site/qgraphproject/examples – jon Jul 13 '12 at 03:58
  • Is subsetting the dendrogram an option? Using a network graph you hide some information about hierarchy. – Roman Luštrik Jul 13 '12 at 10:39

1 Answers1

12

You are describing a fairly typical way of going about cluster analysis:

  • Use a clustering algorithm (in this case hierarchical clustering)
  • Decide on the number of clusters
  • Project the data in a two-dimensional plane using some form or principal component analysis

The code:

hc <- hclust(dist(mtcars))
cluster <- cutree(hc, k=3)
xy <- data.frame(cmdscale(dist(mtcars)), factor(cluster))
names(xy) <- c("x", "y", "cluster")
xy$model <- rownames(xy)

library(ggplot2)
ggplot(xy, aes(x, y)) + geom_point(aes(colour=cluster), size=3)

What happens next is that you get a skilled statistician to help explain what the x and y axes mean. This usually involves projecting the data to the axes and extracting the factor loadings.

The plot:

enter image description here

Andrie
  • 176,377
  • 47
  • 447
  • 496
  • many thanks Andrie, is there is way to connect all points( if possible perhaps with same color within same cluster while different color as gray for inter clusters) – fprd Jul 13 '12 at 11:09