2

I start off without including the data. The problem comes from using geom_smooth with lots of data points (i.e. a large data set), so a minimal data example for illustration purposes seems difficult to use (I tried). But I can submit the data if requested.

I have scores on several variables and want to see trends in these scores across the age of respondents (cross-sectional data). Data are now in long format (so the original variables are all under the column 'name').

Like this:

     age name     value
   <dbl> <chr>    <dbl>
 1    40 mo_clean     1
 2    40 mo_groc      3
 3    40 mo_trans     1
 4    40 mo_digi      3
 5    40 mo_emo       3
 6    40 mo_activ     1
 7    40 mo_supv      1
 8    40 mo_doct      1
 9    39 mo_clean     1
10    39 mo_groc      1
# … with 42,030 more rows

I want to:

  • use geom_smooth and geom_label and
  • then switch to ggrepel::geom_label_repel to avoid overlapping labels

Getting labels to work with geom_smooth turned out difficult, but I managed to do so with the code below:

library(ggplot2)
library(ggrepel)

df %>%
  {
    ggplot(df, aes(age, value, label = name, color = name)) +
      geom_smooth(se = FALSE) +
      guides(color = "none") +
      geom_label(
        data = group_by(., name) %>%
          do(augment(loess(value ~ age, .))) %>%
          filter(age == max(age)),
        aes(age, .fitted), nudge_x = 2
      )
  } +
  scale_x_continuous(breaks = seq(35, 65, by = 5)) +
  xlab("Age") +
  ylab(" ") +
  theme(text = element_text(size = 14))

which gives this result:

Result prior to trying ggreep

Now, as anticipated, substituting geom_label with geom_rabel_repel does not work, due to the many data points. I get the following error message:

`geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Warning message:
ggrepel: 720 unlabeled data points (too many overlaps). Consider increasing max.overlaps 

and all labels in the figure are dropped.

Increasing max.overlaps is not the way to go, I assume. Just to illustrate the extreme case, with max.overlaps = Inf:

[...]
      geom_label_repel(
        data = group_by(., name) %>%
          do(augment(loess(value ~ age, .))) %>%
          filter(age == max(age)),
        aes(age, .fitted), 
        max.overlaps = Inf
      )
[...]

max.overlaps=Inf

Any hint? For instance where to find help (or even code suggestions)? Lots of web searches have not given me what I'm looking for: how to combine geom_smooth with geom_label_repel to get a nice plot with each smoothed line labelled, without labels overlapping.

—-

My question refers to geom_smooth with lots of data points, the linked question (Plot labels at ends of lines) referred to geom_line with few data points.

Note, however, that some of the answers to the other posts mention geom_smooth and present code with geom_smooth. So, I recommend looking at these answers, although they did not solve my problem.

tjebo
  • 21,977
  • 7
  • 58
  • 94
cibr
  • 453
  • 5
  • 16
  • Maybe this post helps: https://stackoverflow.com/questions/69737016/how-to-include-labels-for-a-segmented-geom-smooth-in-r – Quinten Jul 10 '22 at 16:21
  • 3
    I am quite sure that you have to put a `unique` somewhere in your code!. I guess there, where all the duplicates come from in the last provided figure. Good luck! – TarJae Jul 10 '22 at 16:33
  • 1
    @tjebo marked this question as a duplicate, pointing to a question on geom_line() with few data points. I was not able to solve my question with any of the answers to the other post (although some of them referred to geom_smooth, which I use along with many data points). – cibr Jul 11 '22 at 21:13
  • maybe this will help to understand the downvote: https://idownvotedbecau.se/nomcve/ and https://idownvotedbecau.se/noattempt/ – tjebo Jul 12 '22 at 07:51
  • @cibr I understand your frustration. I recommend to read the linked thread, especially the one about why it's useful to create an MCVE. Trying to recreate the problem with a "prototype" / MCVE can often help you come to the solution all by yourself. Additionally, when a question is asked without an MCVE, it is not very useful for other people in the future as they will not be able to recreate the problem. – tjebo Jul 12 '22 at 12:50
  • 1
    I had to edit the question in order to remove the downvote. I hope the title is now a bit more pertinent to the actual problem - which then also makes it arguably slightly different to the [previously suggested duplicate](https://stackoverflow.com/questions/29357612/plot-labels-at-ends-of-lines?noredirect=1&lq=1). – tjebo Jul 12 '22 at 13:06

2 Answers2

3

Indeed, @TarJae is right: adding unique is the way to go.

[...]
  {
    ggplot(df, aes(age, value, label = name, color = name)) +
      geom_smooth(se = FALSE) +
      guides(color = "none") +
      geom_label_repel(
        data = group_by(., name) %>%
          do(augment(loess(value ~ age, .))) %>%
          filter(age == max(age)),
        aes(age, .fitted), 

        # "unique" added (and max.overlaps part dropped)
        stat = "unique",

        # Some *minor* prettifying
        direction = "y",
        nudge_x = 0.7
      )
  } +
[...]

Current result, without further manipulation/prettifying:

One solution

So, the error message leads down the wrong road in this case.


Concerning tjebo's suggestion in an alternative answer: I believe adding stat=unique is a better solution than the second chunk in tjebo's answer (although the resulting figure is still not what I want).

library(tidyverse)
library(ggrepel)

df <- 
  diamonds %>% select(age = table, name = color, value = price)
df %>%
  {
    ggplot(df, aes(age, value, label = name, color = name)) +
      geom_smooth(se = FALSE) +
      geom_label_repel(
        data = group_by(., name) %>%
          do(broom::augment(loess(value ~ age, .))) %>%
          filter(age == max(age)),
        aes(age, .fitted),
        stat = "unique"
      )
  } 
#> `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Better, but not final solution

cibr
  • 453
  • 5
  • 16
2

I am answering because I'd like to demonstrate a way to reproduce the problem. There is hardly any problem that cannot be reproduced with one of the inbuilt data sets. For your problem, you could use the diamonds data set. It contains a similar amount of rows and has similar columns (integers and character).

library(tidyverse)
library(ggrepel)

df <- 
  diamonds %>% select(age = table, name = color, value = price)
df %>%
  {
    ggplot(df, aes(age, value, label = name, color = name)) +
      geom_smooth(se = FALSE) +
      geom_label_repel(
        data = group_by(., name) %>%
          do(broom::augment(loess(value ~ age, .))) %>%
          filter(age == max(age)),
        aes(age, .fitted)
      )
  } 
#> `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Here, on a small scale, you can see repeated labels - that's the core of the problem, and we have reproduced your problem.

Two answers suggest how to label geom_smooth derived curves. Here an adaptation of my own suggestion, using the geom_textpath package. I admit the labels look awkwardly rotated in that case, but I guess with less awkward lines such as in your example it should look nicer.


## fix this using one of the suggested solutions in thread 
## https://stackoverflow.com/questions/29357612/plot-labels-at-ends-of-lines?noredirect=1&lq=1
library(geomtextpath)
ggplot(df, aes(age, value, label = name, color = name)) +
  ## note you currently have to specify method argument, otherwise the disambiguation of some function fails. 
  ## see also https://github.com/AllanCameron/geomtextpath/issues/79) +
  geom_labelsmooth(hjust = 1, method = "loess") 
#> `geom_smooth()` using formula 'y ~ x'

Created on 2022-07-12 by the reprex package (v2.0.1)

tjebo
  • 21,977
  • 7
  • 58
  • 94
  • 1
    I see the repeated E labels and the repeated J labels in the first figure. But I'm sorry to say so, I don't think the second figure is satisfying. To me, the figure in my answer seems much better. Since my knowledge of R code is limited, I will have to stop attempts to use the second code chunk in this answer to develop something I'm satisfied with. I think the figure developed by adding "unique" (as suggested by TarJae) suffices. This solution also maintains the use of geom_smoot() and geom_label_repel(). So I'll stick with my solution, which actually is TarJaes' solution. – cibr Jul 12 '22 at 17:12