3

I have a geom_smooth that has an x-axis date, y-axis COVID cases, and then two categories. I'm trying to plot the maximum peak.

# Reproducible data
library(tidyverse)
df <- tribble(~date, ~cases, ~category,
              "2021/1/1", 100, "A",
              "2021/1/1", 103, "B",
              "2021/1/2", 108, "A",
              "2021/1/2", 109, "B",
              "2021/1/3", 102, "A",
              "2021/1/3", 120, "B",
              "2021/1/4", 150, "A",
              "2021/1/4", 160, "B",
              "2021/1/5", 120, "A",
              "2021/1/5", 110, "B",
              "2021/1/6", 115, "A",
              "2021/1/6", 105, "B",)

# Plotting geom_smooth
df %>%
  ggplot(df, mapping = aes(date, cases, group = category, color = category)) +
  geom_smooth()

How do I add the maximum peak to the geom_smooth? Ideally, I want both a point and a text that says what the peak case is.

I tried finding the peaks outside of the ggplot code - but it returns a different peak because the geom_smooth is creating its own function, not simply the mean of that category.

The response below worked, but I want to move the labels to make it more legible, but geom_text_repel seems to only refer to the first curve rather than both. Any advice?

library(ggplot2)
library(tidyverse)
library(ggrepel)

# Fake data
ar =hist(rnorm(10000,1), breaks = 180, plot=F)$counts
br =hist(rnorm(11000,1), breaks = 180, plot=F)$counts

df <-  rbind(
  tibble(category="B", date = seq(as.Date("2021-01-01"),by=1, length.out=length(br)),value=br),
  tibble(category="A", date = seq(as.Date("2021-01-01"),by=1, length.out=length(ar)),value=ar)
)
# create the smooth and retain rows with max of smooth, using slice_max
sm_max = df %>% group_by(category) %>%
  mutate(smooth =predict(loess(value~as.numeric(date), span=.5))) %>% 
  slice_max(order_by = smooth)

# Plot, using the same smooth as above (default is loess, span set at set above)
df %>%
  ggplot(df, mapping = aes(date, value, group = category, color = category)) +
  geom_point() +
  geom_smooth(span=.5, se=F) + 
  geom_point(data=sm_max, aes(y=smooth),color="black", size=5) + 
  geom_text_repel(data = sm_max, aes(label=paste0("Peak: ",round(smooth,1))), color="black")

geom_text_repel(data = sm_max_p3, aes(x = date,
                                      y = smooth,
                                      label = paste0(candidate, " Peak: ",round(smooth,1))

enter image description here

  • Your `date` is a `character` so it's a discrete variable. Therefore your `geom_smooth()` is just a point-to-point line that you could also get with `geom_line()`. You could convert it to a continuous variable with `mutate(df, date = lubridate::ymd(date))` for example. – Dan Adams Feb 17 '22 at 17:40
  • Also are you looking for the maximum *measured* value (i.e. actually present in your data) or the maximum value in the *smoothed* data? If the latter, you probably need to calculate that first outside `ggplot()` and then use something like [{gghighlight}](https://cran.r-project.org/web/packages/gghighlight/vignettes/gghighlight.html) to get the labels you're looking for. – Dan Adams Feb 17 '22 at 17:42

2 Answers2

1

You need to generate the smooth first, and identify the max. You can then either

  1. plot the data, the smooth, and the max together, or
  2. plot the data and the max, and again use the geom_smooth() call, making sure to use the same smooth in geom_smooth that you did when generating and identifying the max.

Here is an example, which uses the latter of these two options

# Fake data
ar =hist(rnorm(10000,1), breaks = 180, plot=F)$counts
br =hist(rnorm(25000,1), breaks = 180, plot=F)$counts

df = rbind(
  tibble(category="B", date = seq(as.Date("2021-01-01"),by=1, length.out=length(br)),value=br),
  tibble(category="A", date = seq(as.Date("2021-01-01"),by=1, length.out=length(ar)),value=ar)
)
# create the smooth and retain rows with max of smooth, using slice_max
sm_max = df %>% group_by(category) %>%
  mutate(smooth =predict(loess(value~as.numeric(date), span=.5))) %>% 
  slice_max(order_by = smooth)
  
# Plot, using the same smooth as above (default is loess, span set at set above)
df %>%
  ggplot(df, mapping = aes(date, value, group = category, color = category)) +
  geom_point() +
  geom_smooth(span=.5, se=F) + 
  geom_point(data=sm_max, aes(y=smooth),color="black", size=5) + 
  geom_text(data = sm_max, aes(y=smooth, label=paste0("Peak: ",round(smooth,1))), color="black")

peak_smooth

langtang
  • 22,248
  • 1
  • 12
  • 27
  • This worked for me - but my peaks are much closer than yours, so the text is overlapping. I tried to use geom_text_repel, but it seems to only refer to the first curve, not the second. Any advice on how to make the labels more readable? – Annelise Dahl Feb 18 '22 at 19:24
  • I edited my post to show what it looks like now. I would like the labels to be more readable. – Annelise Dahl Feb 18 '22 at 19:29
  • thanks @DanAdams ! good catch!. I've corrected this in the code.. – langtang Feb 18 '22 at 20:02
  • @AnneliseDahl after adding the correction that Dan pointed out, you can use nudge.x and nudge.y to move the points around (i.e. `geom_text(data,mapping, nudge.x,nudge.y...)`) – langtang Feb 18 '22 at 20:08
1

If you're just looking to label the maximum measured value, you can use {gghighlight} to show and label only that point on top of the smoothed curve. Also your date is a character so it's a discrete variable. Therefore your geom_smooth() is just a point-to-point line. Here, I convert it to a continuous variable with mutate(date = lubridate::ymd(date)).

library(tidyverse)
library(lubridate)
library(gghighlight)

df <- tribble(~date, ~cases, ~category,
              "2021/1/1", 100, "A",
              "2021/1/1", 103, "B",
              "2021/1/2", 108, "A",
              "2021/1/2", 109, "B",
              "2021/1/3", 102, "A",
              "2021/1/3", 120, "B",
              "2021/1/4", 150, "A",
              "2021/1/4", 160, "B",
              "2021/1/5", 120, "A",
              "2021/1/5", 110, "B",
              "2021/1/6", 115, "A",
              "2021/1/6", 105, "B",)

# Plotting geom_smooth
df %>%
  mutate(date = ymd(date)) %>%
  group_by(category) %>%
  mutate(is_max = cases == max(cases)) %>% 
  ggplot(df, mapping = aes(date, cases, color = category)) +
  geom_smooth() +
  geom_point(size = 3) +
  gghighlight(is_max,
              n = 1,
              unhighlighted_params = list(alpha = 0),
              label_key = cases)

Created on 2022-02-17 by the reprex package (v2.0.1)

Dan Adams
  • 4,971
  • 9
  • 28