I am creating dendrograms using ggdendro
and coloring them according to cutpoints in the branches. I'm using the approach provided by @jlhoward in this question (Colorize Clusters in Dendogram with ggplot2) but I run into problems when my leaf labels are very long.
Here is some example code:
df <- USArrests
labs <- paste("veryverylongtitlename",1:50,sep="")
rownames(df) <- labs
library(ggplot2)
library(ggdendro)
hc <- hclust(dist(df), "ave") # heirarchal clustering
dendr <- dendro_data(hc, type="rectangle") # convert for ggplot
clust <- cutree(hc,k=2) # find 2 clusters
clust.df <- data.frame(label=names(clust), cluster=factor(clust))
# dendr[["labels"]] has the labels, merge with clust.df based on
label column
dendr[["labels"]] <- merge(dendr[["labels"]],clust.df, by="label")
# plot the dendrogram; note use of color=cluster in geom_text(...)
ggplot() +
geom_segment(data=segment(dendr), aes(x=x, y=y, xend=xend,
yend=yend)) +
geom_text(data=label(dendr), aes(x, y, label=label, hjust=0, color=cluster),
size=3) +
coord_flip() + scale_y_reverse(expand=c(0.2, 0)) +
theme(axis.line.y=element_blank(),
axis.ticks.y=element_blank(),
axis.text.y=element_blank(),
axis.title.y=element_blank(),
panel.background=element_rect(fill="white"),
panel.grid=element_blank())
As you can see, the labels here get cut off. I found this answer (decrease size of dendogram (or y-axis) ggplot), but I don't want to use it because I very much like the ability to use cutree
to define my clusters. How can I manipulate the above code to fit the long labels? Many thanks!