16

I have a rather dense scatterplot that I am constructing with R 'ggplot2' and I want to label a subset of points using 'ggrepel'. My problem is that I want to plot ALL points in the scatterplot, but only label a subset with ggrepel, and when I do this, ggrepel doesn't account for the other points on the plot when calculating where to put the labels, which leads to labels which overlap other points on the plot (which I don't want to label).

Here is an example plot illustrating the issue.

# generate data:
library(data.table)
library(stringi)
set.seed(20180918)
dt = data.table(
  name = stri_rand_strings(3000,length=6),
  one = rnorm(n = 3000,mean = 0,sd = 1),
  two = rnorm(n = 3000,mean = 0,sd = 1))
dt[, diff := one -two]
dt[, diff_cat := ifelse(one > 0 & two>0 & abs(diff)>1, "type_1",
                        ifelse(one<0 & two < 0 & abs(diff)>1, "type_2",
                               ifelse(two>0 & one<0 & abs(diff)>1, "type_3",
                                      ifelse(two<0 & one>0 & abs(diff)>1, "type_4", "other"))))]

# make plot
ggplot(dt, aes(x=one,y=two,color=diff_cat))+
  geom_point()

plot without labels

If I plot only the subset of points I want labelled, then ggrepel is able to place all of the labels in a non-overlapping fashion with respect to other points and labels.

ggplot(dt[abs(diff)>2 & (!diff_cat %in% c("type_3","type_4","other"))], 
  aes(x=one,y=two,color=diff_cat))+
  geom_point()+
  geom_text_repel(data = dt[abs(diff)>2 & (!diff_cat %in% c("type_3","type_4","other"))], 
                  aes(x=one,y=two,label=name))

plot labelled points only

However when I want to plot this subset of data AND the original data at the same time, I get overlapping points with labels:

# now add labels to a subset of points on the plot
ggplot(dt, aes(x=one,y=two,color=diff_cat))+
  geom_point()+
  geom_text_repel(data = dt[abs(diff)>2 & (!diff_cat %in% c("type_3","type_4","other"))], 
                  aes(x=one,y=two,label=name))

plot with labels

How can I get the labels for the subset of points to not overlap the points from the original data?

Reilstein
  • 1,193
  • 2
  • 11
  • 25

1 Answers1

27

You can try the following:

  1. Assign a blank label ("") to all the other points from the original data, so that geom_text_repel takes them into consideration when repelling labels from one another;
  2. Increase the box.padding parameter from the default 0.25 to some larger value, for greater distance between labels;
  3. Increase the x and y-axis limits, to give the labels more space at the four sides to repel towards.

Example code (with box.padding = 1):

ggplot(dt, 
       aes(x = one, y = two, color = diff_cat)) +
  geom_point() +
  geom_text_repel(data = . %>% 
                    mutate(label = ifelse(diff_cat %in% c("type_1", "type_2") & abs(diff) > 2,
                                          name, "")),
                  aes(label = label), 
                  box.padding = 1,
                  show.legend = FALSE) + #this removes the 'a' from the legend
  coord_cartesian(xlim = c(-5, 5), ylim = c(-5, 5)) +
  theme_bw()

plot

Here's another attempt, with box.padding = 2:

plot 2

(Note: I'm using ggrepel 0.8.0. I'm not sure if all the functionalities are present for earlier package versions.)

Z.Lin
  • 28,055
  • 6
  • 54
  • 94