I’m trying to determine how to set the span
argument for geom_smooth()
based on meaningful units from my data. As an example, let’s say I have a daily time series, with lower values on weekends (see bottom of post for data and code):

I’d like to smooth over a 7-day window, to smooth out the periodic dips due to weekends, but otherwise maximize resolution of the smoothed line — similar to a 7-day moving average.
My question is: how do I translate something like "7 days" into the correct value for span
?
According to this SO answer, span
sets the alpha parameter for the loess regression. The answer quotes Jacoby, 2000:
alpha gives the proportion of observations that is to be used in each local regression. Accordingly, this parameter is specified as a value between 0 and 1. The alpha value used for the loess curve in Fig. 2 is 0.65; so, each of the local regressions used to produce that curve incorporates 65% of the total data points.
Based on this, I tried setting span
based on days per week (7
) divided by the number of days in the data (nrow(mydata)
):
library(ggplot2)
ggplot(mydata, aes(date, value)) +
geom_point() +
geom_smooth(se = FALSE, span = 7 / nrow(mydata))
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
But this doesn't smooth out the weekend dips:
Data:
library(tidyverse)
library(lubridate)
set.seed(1)
mydata <- tibble(
date = seq(ymd("2020-01-01"), ymd("2020-04-01"), by = 1)
) %>%
mutate(
value = if_else(
weekdays(date) %in% c("Saturday", "Sunday"),
rnorm(n(), 10, 3), # lower values on weekends
rnorm(n(), 50, 10)
),
value = if_else(
date > ymd("2020-02-15"),
value + rnorm(n(), 20, 2), # stepwise increase after Feb 15
value
)
)
ggplot(mydata, aes(date, value)) +
geom_point()