1

I've used hclust to generate a cluster dendrogram of some data, but I need to isolate all the paired clusters, i.e. all the clusters that comprise just 2 pieces of data (the first ones to be clustered together), even if they might be clustered with other data on a "higher" branch. Does anyone know how I can do that?

I've highlighted the clusters I want to isolate in the attached image, hopefully that explains it better.

Dendrogram

I'd like to be able to isolate all the paired data in those clusters in such a way to be able to compare the clusters on their contents. For example to see which of them contain a particular type of data.

High Performance Mark
  • 77,191
  • 7
  • 105
  • 161
  • 1
    Please get used to providing a reproducible example - i.e. code (what you tried) inlcuding dummy data, ready for copying-pasting-running. See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – lukeA Mar 08 '16 at 14:23

1 Answers1

3

FWIW, you could extract the "forks" like this:

hc <- hclust(dist(USArrests), "ave")
plot(hc)

enter image description here

res <- list()
invisible(dendrapply(as.dendrogram(hc), function(x) {
  if (attr(x, "members")==2) 
    if (all(sapply(x[1:2], is.leaf))) 
      res <<- c(res, list(c(attr(x[[1]], "label"), attr(x[[2]], "label"))))
  x
}))
head( do.call(rbind, res) )
#     [,1]          [,2]            
# [1,] "Florida"     "North Carolina"
# [2,] "Arizona"     "New Mexico"    
# [3,] "Alabama"     "Louisiana"     
# [4,] "Illinois"    "New York"      
# [5,] "Michigan"    "Nevada"        
# [6,] "Mississippi" "South Carolina"

(just the first 6 rows of the result)

lukeA
  • 53,097
  • 5
  • 97
  • 100
  • 1
    Many thanks, that's exactly what I needed, and I've figured out how to progress from this point. In future I will try to provide a reproducible example. – Rquestion550 Mar 09 '16 at 14:13
  • You're welcome. Feel free to tick the check mark next to the answer if you think this solved the issue. – lukeA Mar 09 '16 at 14:42