2

I have a changing df and I am grouping different values c. With ggplot2 I plot them with the following code to get a scatterplott with multiple linear regression lines (geom_smooth)

ggplot(aes(x = a, y = b, group = c)) + 
  geom_point(shape = 1, aes(color = c), alpha = alpha) +
  geom_smooth(method = "lm", aes(group = c, color = c), se = F)

Now I want to display on each geom_smooth line in the plot a label with the value of the group c. This has to be dynamic, because I can not write new code when my df changes.


Example: my df looks like this

  a     b     c
----------------
 1.6    24   100
-1.4    43   50
 1      28   100
 4.3    11   50
-3.45   5.2  50

So in this case I would get 3 geom_smooth lines in the plot with different colors.

Now I simply want to add a text label to the plot with "100" next to the geom_smooth with the group c = 100 and a text label with "50"to the line for the group c = 50, and so on... as new groups get introduced in the df, new geom_smooth lines are plotted and need to be labeled.


the whole code for the plot:

 ggplot(aes(x = a, y = b, group = c), data = df, na.rm = TRUE) + 
  geom_point(aes(color = GG, size = factor(c)), alpha=0.3) +
  scale_x_continuous(limits = c(-200,2300))+
  scale_y_continuous(limits = c(-1.8,1.5))+
  geom_hline(yintercept=0, size=0.4, color="black") +
  scale_color_distiller(palette="YlGnBu", na.value="white") +
  geom_smooth(method = "lm", aes(group = factor(GG), color = GG), se = F) +
  geom_label_repel(data = labelInfo, aes(x= max, y = predAtMax, label = label, color = label))
Max
  • 397
  • 5
  • 15

3 Answers3

4

You can probably do it if you pick the location you want the lines labelled. Below, I set them to label at the far right end of each line, and used ggrepel to avoid overlapping labels:

library(ggplot2)
library(ggrepel)
library(dplyr)

set.seed(12345)

df <- 
  data.frame(
    a = rnorm(100,2,0.5)
    , b = rnorm(100, 20, 5)
    , c = factor(sample(c(50,100,150), 100, TRUE))
  )

labelInfo <-
  split(df, df$c) %>%
  lapply(function(x){
    data.frame(
      predAtMax = lm(b~a, data=x) %>%
        predict(newdata = data.frame(a = max(x$a)))
      , max = max(x$a)
    )}) %>%
  bind_rows

labelInfo$label = levels(df$c)

ggplot(
  df
  , aes(x = a, y = b, color = c)
  ) + 
  geom_point(shape = 1) +
  geom_smooth(method = "lm", se = F) +
  geom_label_repel(data = labelInfo
                   , aes(x= max
                         , y = predAtMax
                         , label = label
                         , color = label))
Mark Peterson
  • 9,370
  • 2
  • 25
  • 48
  • Thanks, clever solution! I'm pretty new but it does not work with my DF because I have some `NA`in `a`, `b` and `c`. I could not `filter(!is.na(a)) %>%` before your code? – Max Jun 23 '16 at 16:05
  • I also can not `level(df$c)`- it returns NULL` (should be around 10 values) – Max Jun 23 '16 at 16:25
  • `levels(df$c)` only works if the variable is a factor (I set mine to be a factor). – Mark Peterson Jun 23 '16 at 16:32
  • 1
    What is not working when you have the `NA`s? If you want, you could use `df <- na.omit(df)` before running anything to remove rows with missing data. Alternatively, add `na.rm = TRUE` in each of the calls to `max` – Mark Peterson Jun 23 '16 at 16:36
  • 1
    @Max You might try `unique` instead of `levels` if you aren't working with a factor. – aosmith Jun 23 '16 at 16:48
  • I did levels(factor()) so the labels worked! But I can not get it into my plot. My plot is: `df %>% filter(!is.na(a)) %>% filter(!is.na(c)) %>% tbl_df() %>% ggplot()` – Max Jun 23 '16 at 16:51
  • A warning about `unique` though: I believe that `split` converts to a factor, and so returns in alphabetical order if the levels are not already set. `unique` returns in the order it encounters values. So, the resulting order of `split` and `unique` may differ. – Mark Peterson Jun 23 '16 at 16:52
  • Did you remove the `NA` before creating the `labelInfo` table as well? That is where you are most likely to be seeing the issue. What does `labelInfo` look like? If the `NA` are not removed, and `na.rm = TRUE` is not set in `max()`, you will just be getting `NA` for all of the positions – Mark Peterson Jun 23 '16 at 16:59
  • `labelInfo`looks correct, I can plot it with your code with ease on another plot! Is it because of this: `df %>% filter(!is.na(a)) %>% filter(!is.na(c)) %>% tbl_df() %>% ggplot()+...`Maybe I can not get it from that ggplot? – Max Jun 23 '16 at 17:07
  • Ok, I added the whole code for the ggplot in the original question! – Max Jun 23 '16 at 17:23
  • What error is actually occurring? I don't have a `GG` column, so I am not sure how that relates. It appears that you are trying to use a continuous color scale, but the data you originally showed was discrete. Is `GG` a new grouping column, and you are just using column `c` for coloring? If so, you will need to be careful about how you color the lines, etc. A minimal working example would be helpful, particularly if the example that I generated is not actually matching what you are doing. – Mark Peterson Jun 23 '16 at 20:17
1

This method might work for you. It uses ggplot_build to access the rightmost point in the actual geom_smooth lines to add a label by it. Below is an adaptation that uses Mark Peterson's example.

library(ggplot2)
library(ggrepel)
library(dplyr)

set.seed(12345)

df <- 
  data.frame(
    a = rnorm(100,2,0.5)
    , b = rnorm(100, 20, 5)
    , c = factor(sample(c(50,100,150), 100, TRUE))
  )

p <-
  ggplot(df, aes(x = a, y = b, color = c)) + 
  geom_point(shape = 1) +
  geom_smooth(method = "lm", se = F)

p.smoothedmaxes <- 
  ggplot_build(p)$data[[2]] %>% 
  group_by( group) %>% 
  filter( x == max(x))

p +
  geom_text_repel( data = p.smoothedmaxes, 
             mapping = aes(x = x, y = y, label = round(y,2)), 
             col = p.smoothedmaxes$colour,
             inherit.aes = FALSE)

plot result

Joel Buursma
  • 118
  • 6
0

This came up for me today and I landed on this solution with data = ~fn()

library(tidyverse)
library(broom)

mpg |>
  ggplot(aes(x = displ, y = hwy, colour = class, label = class)) +
  geom_count(alpha = 0.1) +
  stat_smooth(alpha = 0.6, method = lm, geom = "line", se = FALSE) +
  geom_text(
    aes(y = .fitted), size = 3, hjust = 0, nudge_x = 0.1,
    data = ~{
      nest_by(.x, class) |>
        summarize(broom::augment(lm(hwy ~ displ, data = data))) |>
        slice_max(order_by = displ, n = 1)
    }
  ) +
  scale_x_continuous(expand = expansion(add = c(0, 1))) +
  theme_minimal()

Or do it with a function

#' @examples
#' last_lm_points(df = mpg, formula = hwy~displ, group = class)
last_lm_points <- function(df, formula, group) {
  # df <- mpg; formula <- as.formula(hwy~displ); group <- sym("class");
  x_arg <- formula[[3]]
  df |> 
    nest_by({{group}}) |> 
    summarize(broom::augment(lm(formula, data = data))) |>
    slice_max(order_by = get(x_arg), n = 1)
}

mpg |>
  ggplot(aes(displ, hwy, colour = class, label = class)) +
  geom_count(alpha = 0.1) +
  stat_smooth(alpha = 0.6, method = lm, geom = "line", se = FALSE) +
  geom_text(
    aes(y = .fitted), size = 3, hjust = 0, nudge_x = 0.1,
    data = ~last_lm_points(.x, hwy~displ, class)
  ) +
  scale_x_continuous(expand = expansion(add = c(0, 1))) +
  theme_minimal()

enter image description here

yake84
  • 3,004
  • 2
  • 19
  • 35