1

I'm attempting to find the 7-day rolling average of new cases and new deaths in a COVID-19 dataset as part of an assignment. I've already found the new cases and deaths per day. I just need to find the 7-day average for each day that there's enough data. I want to do this as a new column using the mutate function. However, when I put the function in, the data that shows up begins at day 4, not 7, and is mathematically incorrect. Any ideas?

I have tried the below code, and I should be getting these results:

enter image description here

us_totals %>%
  
  mutate(delta_deaths_7 = (rollmean(deaths, k = 7, fill = NA)))

which gets:incorrect code

I know for a fact that the rolling average for that column is supposed to start on day 7, and the first row should have a value of 55.7.

I have also tried the slider function, a self-input function to calculate the rolling mean, and all possible alignments of the rollmean function.

So far, everything has either yielded nothing but NA values, or the code seen above.

Some added detail: the code has been grouped by date already, and filtered to only pull data since March 15th, 2020. I realize that the rollmean could be pulling from filtered-out data but have no clue how to fix it.

  • hello, can you please include a reproducible example (which means including inputs and desired outputs) as code pasted into code blocks, NOT as images please - code in images can't be read by screen readers nor copied and pasted. https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Paul Stafford Allen Aug 16 '23 at 13:33
  • for example, if you run `dput(us_totals)` and copy-paste the result into a code block by editing your question, then anyone trying to help has an exact copy of your data to demonstrate with. – Paul Stafford Allen Aug 16 '23 at 13:35

2 Answers2

1

By default rollmean uses align = "center" which puts the result at the middle observation of the window. Use align = "right" you want the average value to be aligned to the last index in the 7 period window. Alternatively, you can use rollmeanr()

library(xts)
data(sample_matrix)
z <- zoo(sample_matrix, as.Date(rownames(sample_matrix)))
z$mean_center <- rollmean(z[,1], k = 7, fill = NA)
z$mean_right <- rollmeanr(z[,1], k = 7, fill = NA)
head(z, 10)
##                Open     High      Low    Close mean_center mean_right
## 2007-01-02 50.03978 50.11778 49.95041 50.11778          NA         NA
## 2007-01-03 50.23050 50.42188 50.23050 50.39767          NA         NA
## 2007-01-04 50.42096 50.42096 50.26414 50.33236          NA         NA
## 2007-01-05 50.37347 50.37347 50.22103 50.33459    50.21096         NA
## 2007-01-06 50.24433 50.24433 50.11121 50.18112    50.20454         NA
## 2007-01-07 50.13211 50.21561 49.99185 49.99185    50.15908         NA
## 2007-01-08 50.03555 50.10363 49.96971 49.98806    50.08256   50.21096
## 2007-01-09 49.99489 49.99489 49.80454 49.91333    50.05957   50.20454
## 2007-01-10 49.91228 50.13053 49.91228 49.97246    50.07093   50.15908
## 2007-01-11 49.88529 50.23910 49.88529 50.23910    50.11829   50.08256
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • Thank you for the information. I got much closer to what I need to get, but I should have been including the values I should be getting. I'm still getting the same incorrect values overall. I tried: `us_totals %>% mutate(delta_deaths_7 = rollmeanr(deaths, k = 7, fill = NA)), `and got the same values I did as before. My delta_deaths_7 column should be starting with the value 55.7, and my delta_cases_7 should start with 4210, on what would be day 8 of the data set. How would I get those values, and what else am I doing wrong? – Grace Cooper Aug 16 '23 at 19:07
0

Edit: I found out I was using the wrong column for the data. Thank you for your help! I needed to use the column for the increases in cases/deaths each day, not the base cases. Thanks for helping me learn, the trick about align right was a big help!

  • This would be better as a comment on my answer, in response to your initial comment. As it is right now, it doesn't answer your original question and may attract down-votes because of that. – Joshua Ulrich Aug 16 '23 at 19:26
  • It also helps others if you select my answer as the 'correct' one, so others know it answered your question. – Joshua Ulrich Aug 16 '23 at 19:33