1

I came across this article in The New York Times today about coronavirus and I liked how the graphs were presented. I know the bar plots can is just using geom_col() in ggplot but I am more interested in the smoothing part. Just like this graph:

enter image description here

They said that "each red line is the seven-day moving average, which smooths out day-to-day anomalies..." How do you do that? I have a dataset that I plan to present it in a similar way.

Thanks!

Edward
  • 10,360
  • 2
  • 11
  • 26
mand3rd
  • 383
  • 1
  • 12
  • See these https://uc-r.github.io/ts_moving_averages & https://stackoverflow.com/questions/743812/calculating-moving-average – Tung Mar 20 '20 at 01:12

2 Answers2

2

data.table also has a rolling mean function, frollmean, which can be used for this purpose:

library(data.table)
library(ggplot2)
library(scales)

# create some data
set.seed(1)
DT <- data.table(N = rescale(dnorm(seq(-10, 10, by=.1)) + 
        runif(201, -.1, .1), c(1, 800)))

# apply rolling mean over 10 data points
DT[, `:=`(rollN = frollmean(N, n = 10, align = "center"), idx = .I)]

ggplot(DT, aes(x=idx, y=N)) + 
    theme_bw() + 
    geom_line() + # original data
    geom_line(data=DT, aes(x=idx, y=rollN), colour = "red", size = 2) +  # rolling mean
    geom_histogram(aes(x=idx, weight = N/10), binwidth = 10, inherit.aes = FALSE, fill="red", alpha = .2) # histogram
#> Warning: Removed 9 row(s) containing missing values (geom_path).

Created on 2020-03-19 by the reprex package (v0.3.0)

user12728748
  • 8,106
  • 2
  • 9
  • 14
0

This takes the 3 period moving average of the 3 points up to and including the current point. The first two point are NA because there are not 3 points and the third is (1+2+3)/3=2 and the fourth is (2+3+4)/3=3 and so on. Omit fill = NA if you don't want the NAs. If you want centered moving averages remove the r at the end of rollmeanr.

library(zoo)
x <- 1:10 # test input
rollmeanr(x, 3, fill = NA)
## [1] NA NA  2  3  4  5  6  7  8  9

To take the averages of 3 or fewer points use rollapplyr with partial=TRUE. Here the first point in the output is just 1 because the average of 1 is 1. The second is (1+2)/2=1.5 and the remaining are as above.

rollapplyr(x, 3, mean, partial = TRUE)
## [1] 1.0 1.5 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0

See ?rollapply for more information.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341