2

I have problem about group in cluster analysis(hierarchical cluster). As example, this is the dendrogram of complete linkage of Iris data set.

enter image description here

After I use

> table(cutree(hc, 3), iris$Species)

This is the output:

  setosa versicolor virginica
1     50          0         0
2      0         23        49
3      0         27         1

I have read in one statistical website that, object 1 in the data always belongs to group/cluster 1. From the output above, we know that setosa is in group 1. Then, how I am going to know about the other two species. How do they fall into either group 2 or 3. How did it happen. Perhaps there is a calculation I need to know?

ekad
  • 14,436
  • 26
  • 44
  • 46
Annie
  • 102
  • 2
  • 7

1 Answers1

3

I'm guessing that you're using this to create that image that doesn't appear to be there at the moment.

> lmbjck <- cutree(hclust(dist(iris[1:4], "euclidean")), 3)
> table(lmbjck, iris$Species)

lmbjck setosa versicolor virginica
     1     50          0         0
     2      0         23        49
     3      0         27         1

Dist is created from measurements of plants from three different species with identical column and row names.

> iris.dist <- dist(iris[1:4], "euclidean")
> identical(rownames(iris.dist), colnames(iris.dist))
[1] TRUE

That object is passed on to hclust which constructs a tree and cut it into three pieces. Object iris.order holds the order by which the dendrogram is drawn. Original order is preserved, the tree is drawn based on this ordering.

> iris.hclust <- hclust(iris.dist)
> iris.cutree <- cutree(iris.hclust, 3)
> iris.order <- iris.hclust$order

Here's proof. I've put together original Species designations, ordered species designations as they can be seen in the dendrogram, order number and group from a cutree function.

> data.frame(original = iris$Species, ordered = iris$Species[iris.order],
             order.num = iris.order, cutree = iris.cutree)

      original    ordered order.num cutree
1       setosa  virginica       108      1
2       setosa  virginica       131      1
3       setosa  virginica       103      1
4       setosa  virginica       126      1
5       setosa  virginica       130      1
6       setosa  virginica       119      1
    ...
103  virginica     setosa        31      2
104  virginica     setosa        26      2
105  virginica     setosa        10      2
106  virginica     setosa        35      2
107  virginica     setosa        13      3
108  virginica     setosa         2      2
    ...

Let's look at the output. If you look at the first line, under order.num there's number 108. This means that for this item (first item on the left side of the dendrogram) comes from row 108. Skim down to line 108, and you can see that the original Species is indeed virginica. Cutree assigns this to group 1. Let's look at line 3. Under order.num you can see that this item comes from row 103. Again, if you go down and check the original species in row 103, it's (still) virginica. I'll make it an exercise for you to check other (random) rows and convince yourself that the order for constructing the table at the beginning is preserved. Ergo, the table should thus be correct.

Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
  • Thank you for your explaination @Roman Luštrik. I will go through you answer. And I will edit the picture. – Annie Jul 16 '12 at 06:30
  • I'm trying to edit but somehow, the picture is not showing. another way to look at the image is, copy this link>http://i.stack.imgur.com/78c1m.png in the adress bar. Really sorry for the inconvenience. – Annie Jul 16 '12 at 06:40
  • @Annie, you probably don't have enough privileges yet. I've edited your question. If you feel the provided answer answered your question, feel free to mark it as such by clicking the grey check mark below the answer score. – Roman Luštrik Jul 16 '12 at 07:10
  • yes @Roman Luštrik, thank you for the answer. it help me. however, I keep wondering, how do the group is classified manually. because if I change the: ie. the value of group 2 to group 3 value (table above), and compute ARI, there absolutely give different answer, and maybe wrong.the table clearly give the answer, group 1 is what and so group 2 and 3. It is just something that keep me thinking and I dont have the solution yet. – Annie Jul 16 '12 at 12:15