5

Is there any function etc which avoids overlapping data labels for identical data points in a scatter plot? I have checked the various questions/responses to textxy, direct.label, and geom_text(), but I haven't been successful. Maybe it's simply not possible.

Here's a sample of the relevant data:

structure(list(cowc = structure(c(5L, 7L, 24L, 24L, 23L, 36L, 
34L, 38L, 23L, 6L, 8L, 38L, 38L, 23L, 5L, 7L, 24L, 24L, 23L, 
36L, 34L, 38L, 23L, 6L, 8L, 38L, 38L, 23L), .Label = c("AFG", 
"ANG", "AZE", "BNG", "BOS", "BUI", "CAM", "CDI", "CHA", "COL", 
"CRO", "DOM", "DRC", "ETH", "GNB", "GRG", "GUA", "IND", "INS", 
"IRQ", "KEN", "LAO", "LBR", "LEB", "MAL", "MLD", "MZM", "NEP", 
"NIC", "PHI", "PNG", "RUS", "RWA", "SAF", "SAL", "SIE", "SOM", 
"SUD", "TAJ", "UKG", "YAR", "ZIM"), class = "factor"), conflict = c("Bosnia 92-95", 
"Cambodia 70-91", "Lebanon 58-58", "Lebanon 75-89", "Liberia 89-93", 
"SieLeo 91-96", "Stafrica 83-91", "Sudan 63-72", "Liberia 94-96", 
"Burundi 1993-2005", "Cote d'Ivoire 2002-2007", "Darfur, Sudan 2003-2010", 
"Sudan 83-05", "Liberia 1999-2003", "Bosnia 92-95", "Cambodia 70-91", 
"Lebanon 58-58", "Lebanon 75-89", "Liberia 89-93", "SieLeo 91-96", 
"Stafrica 83-91", "Sudan 63-72", "Liberia 94-96", "Burundi 1993-2005", 
"Cote d'Ivoire 2002-2007", "Darfur, Sudan 2003-2010", "Sudan 83-05", 
"Liberia 1999-2003"), totalps = c(3L, 2L, 2L, 2L, 1L, 3L, 4L, 
3L, 1L, 3L, 3L, 4L, 3L, 3L, 3L, 2L, 2L, 2L, 1L, 3L, 4L, 3L, 1L, 
3L, 3L, 4L, 3L, 3L), vetotype = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("strictvetos", "lenientvetos"
), class = "factor"), intensity = c(3L, 4L, 2L, 5L, 2L, 2L, 2L, 
2L, 2L, 3L, 2L, 2L, 2L, 2L, 3L, 4L, 2L, 6L, 2L, 2L, 4L, 2L, 2L, 
3L, 3L, 2L, 2L, 2L)), .Names = c("cowc", "conflict", "totalps", 
"vetotype", "intensity"), class = "data.frame", row.names = c(NA, 
-28L))

Here's my code:

vetotype.plot <- ggplot(vetotype.x, aes(x=totalps, y=intensity, color=conflict))+
      geom_point() + 
      labs(x="number of power-sharing arenas", y="intensity") +
      ggtitle("Number of Power-Sharing areas and Veto intensity") +
      geom_text(aes(label=conflict),hjust=0, vjust=0, size=4)+
      scale_x_continuous(limits=c(1, 5))+
      theme(legend.position="none")+
      facet_wrap(~vetotype, nrow=2)

plot(vetotype.plot)

And below is my graph. I manually highlighted those data points which are overlapping.

What I am looking for is an 'automatic' way to get the labels of the overlapping data points displayed in way so that they don't overlap. Is there any function for this purpose? Many thanks!

enter image description here

zoowalk
  • 2,018
  • 20
  • 33
  • 1
    take a look at the `directlabels` package and this [relevant question](http://stats.stackexchange.com/questions/16057/how-do-i-avoid-overlapping-labels-in-an-r-plot) – Justin Feb 05 '14 at 18:37
  • Potential [duplicate question](http://stackoverflow.com/questions/11197554/how-to-jitter-text-to-avoid-overlap-in-a-ggplot2-scatterplot) – BrodieG Feb 05 '14 at 19:06
  • Many thanks. I looked again at directlabels package. I figured that geom_text(aes(label=conflict),hjust=0, vjust=0, size=4) has to be taken out when producing the ggplot-plot (since labels are later added via direct.label(vetotype.plot). The result then is ok, imo, but not as nice as jlhoward's solution below. – zoowalk Feb 06 '14 at 11:02

3 Answers3

5

This is not a completely general solution, but it does seem to work in your case.

library(ggplot2)
# identify duplicated points
dupes <- aggregate(conflict~totalps+intensity+vetotype,vetotype.x,length)
colnames(dupes)[4] = "dupe"
df <- merge(vetotype.x,dupes)   # add dupe column
df$vjust <- 0                   # default vertical offset is 0
# calculate vertical offsets based on number of dupes
for (i in 2:max(df$dupe)) df[df$dupe==i,]$vjust<-seq(-trunc(i/2),-trunc(i/2)+i-1)
# render the plot
vetotype.plot <- ggplot(df, aes(x=totalps, y=intensity, color=conflict))+
  geom_point() + 
  labs(x="number of power-sharing arenas", y="intensity") +
  ggtitle("Number of Power-Sharing areas and Veto intensity") +
  geom_text(aes(label=conflict,vjust=vjust), hjust=0,size=4)+
  scale_x_continuous(limits=c(1, 5))+
  scale_y_continuous(limits=c(1, 6))+
  theme(legend.position="none")+
  facet_wrap(~vetotype, nrow=2)

plot(vetotype.plot)

jlhoward
  • 58,004
  • 7
  • 97
  • 140
  • many thanks! to me that's quite a creative approach with an excellent result. I wouldn't have been able to figure this out on my own. – zoowalk Feb 06 '14 at 10:55
1

ggrepel can now do this easily:

https://twitter.com/slowkow/status/686341190749392896

Ari B. Friedman
  • 71,271
  • 35
  • 175
  • 235
1

Here's what your plot looks like with ggrepel:

library(ggrepel)

ggplot(vetotype.x, aes(x=totalps, y=intensity, color=conflict))+
  geom_point() + 
  labs(x="number of power-sharing arenas", y="intensity") +
  ggtitle("Number of Power-Sharing areas and Veto intensity") +
  geom_text_repel(
    aes(label=conflict), size=4, box.padding = unit(0.5, "lines")
  )+
  scale_x_continuous(limits=c(1, 5))+
  theme(legend.position="none")+
  facet_wrap(~vetotype, nrow=2)

enter image description here

Kamil Slowikowski
  • 4,184
  • 3
  • 31
  • 39