How to change node labels of dendrogram plot

Question

I did a hierarchical cluster for a project. I have 300 observations each of 20 variables. I indexed all the variables so that each variable is between 0 and 1, a larger value being better.

I used the following code to create a cluster plot.

d_data <- dist(all_data[,-1])
d_data_ind <- dist(data_ind[,-1])
hc_data_ind <- hclust(d_data_ind, method = "complete")
dend<- as.dendrogram(hc_data_ind)
plot(dend)

Now the labels of the nodes are in row names, the numbers 1 to 300 (see top pic). During the analysis, I removed the first column of the data frame which is labeled "geography" (see bottom pic), because they were city names in text and would screw up the analysis. But I really need to get the city names on the cluster plot in their right spots, because I need to choose a list of cities based on the results.

What code should I write to insert the city names in the "geography" column into this plot, corresponding to their row names?

As you can see from the data frame (bottom pic), all the city names are in alphabetical order, neatly in ascending order, just like the row names. I'm sure it isn't hard to put the city names onto the plot, I just can't find it by googling and asking around.

Please get used to provide reproducible code, ready to copy-paste-run, to make it easier for visitors & readers. (E.g. `all_data` is not given; screenshots of data sets are not helpful; providing the result of `dput(my_data)` is the way to go.) — lukeA, Apr 06 '16 at 22:22
[Why not improve your question now](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? — Jaap, Apr 07 '16 at 06:58

score 3 · Answer 1 · answered Apr 07 '16 at 18:56

I think that what you are asking is "how can I decide on the labels in a dendrogram". So this has two parts. For example, let's use the simple data of the numbers c(1,2,5,6)

1) When you create the hclust using dist, it uses the names of the items. And if they don't exist then it uses a running index. For example:

x <- c(1,2,5,6)
d1 <- as.dendrogram(hclust(dist(x)))
plot(d1)

This is obviously a problem since the items we have are 1,2,5,6 and not 1:4! So how can we fix this? One way is update the names. For example:

x <- c(1,2,5,6)
names(x) <- x
x
d2 <- as.dendrogram(hclust(dist(x)))
plot(d2)

I believe this basically solves your problem (and frankly, doesn't require dendextend). But if you want to update the text AFTER creating the dendrogram - read on:

2) The dendextend package allows you to update the labels of a dendrogram. But you need to make sure you are using the correct order (since the order of the original vector, and that of the labels in the tree are not the same!). Here is how it can be done:

if (!require(dendextend)) install.packages(dendextend);
library(dendextend)
x <- c(1,2,5,6)
d3 <- as.dendrogram(hclust(dist(x)))
labels(d3) <- x[order.dendrogram(d3)]
plot(d3)

Here is how we would do it for a more complex data object (where we may not want to play with the row names of the object, but to update the dendrogram):

if (!require(dendextend)) install.packages(dendextend);
library(dendextend)
x <- CO2[,4:5]
d4 <- as.dendrogram(hclust(dist(x)))
labels(d4) <- apply(CO2[,1:3], 1, paste, collapse = "_")[order.dendrogram(d4)]

d4 <- set(d4, "labels_cex", 0.6)
d4 <- color_branches(d4, k = 3)
par(mar = c(3,0,0,6))
plot(d4, horiz = T)

score 2 · Answer 2 · answered Apr 06 '16 at 22:20

2

You want the original labels instead of IDs? Maybe this helps you with your analysis:

data <- USArrests[1:5, ]
data <- cbind(label=row.names(data), data)
row.names(data) <- NULL
d <- dist(data[, -1])
hc <- hclust(d)
plot(hc)
rect.hclust(hc, h=40)

data$label[order.dendrogram(as.dendrogram(hc))]
# [1] "Arkansas"   "Arizona"    "California" "Alabama"    "Alaska"  

clusters <- cutree(hc, h=40)
split(data$label, clusters)
# $`1`
# [1] "Alabama" "Alaska" 
# 
# $`2`
# [1] "Arizona"    "California"
# 
# $`3`
# [1] "Arkansas"

hc$labels <- data$label
plot(hc)

PS: I found it helpful to save dendrograms to pdf, where you can zoom in and out easily: pdf("my.pdf"); plot(hc); dev.off().

answered Apr 06 '16 at 22:20

lukeA

53,097
5
97
100

tried this solution and returned error...ended up using a sketchbook to manually enter the character values according to the row numbers lol, but I'll poke around more when I have time – Elan Apr 07 '16 at 03:37
_"tried this solution and returned error.."_ - what exactly did you try, what error message did it return? You should edit your post and add the data plus the full code to reproduce your problem. Otherwise it's not possible to help. – lukeA Apr 07 '16 at 04:31

How to change node labels of dendrogram plot

2 Answers2