I need to find all local maxima of a geom_smooth()
curve in R. This has been asked in Stack Overflow before:
How can I get the peak and valleys of a geom_smooth line in ggplot2?
But the answer related to finding a single maximum. What if there are multiple local maxima we want to find?
Here's some sample data:
library(tidyverse)
set.seed(404)
df <- data.frame(x = seq(0,4*pi,length.out=1000),
y = sin(seq(0,4*pi,length.out=1000))+rnorm(100,0,1))
df %>% ggplot(aes(x=x,y=y)) +
geom_point() +
geom_smooth()
To find a single maximum, we use the function underlying geom_smooth()
in order to get the y values of the curve. This would be either gam()
for 1000+ data points or loess()
for fewer than 1000. In this case, it's gam()
from library(mgcv)
. To find our maximum is a simple matter of subsetting with which.max()
. We can plot the modeled y values over geom_smooth()
to confirm they're the same, with our maximum represented by a vertical line:
library(mgcv)
df <- df %>%
mutate(smooth_y = predict(gam(y ~ s(x,bs="cs"),data=df)))
maximum <- df$x[which.max(df$smooth_y)]
df %>% ggplot() +
geom_point(aes(x=x,y=y)) +
geom_smooth(aes(x=x,y=y)) +
geom_line(aes(x=x,y=smooth_y),size = 1.5, linetype = 2, col = "red") +
geom_vline(xintercept = maximum,color="green")
So far, so good. But, there is more than one maximum here. Maybe we're trying to find the periodicity of the sine wave, measured as the average distance between maxima. How do we make sure we find all maxima in the series?
I am posting my answer below, but I am wondering if there's a more elegant solution than the brute-force method I used.