I am a researcher working on a COVID-19 dataset (publicly available from European CDC). I am trying to get R to compute a “cumulative cases” variable which sums the total cases by “dateRep” from all previous date of the particular country. See attached screenshot from Excel. Any thoughts on how you would code and compute this in R? Thanks, I appreciate it alot!
Asked
Active
Viewed 133 times
-2
-
1Please include data as copy/pasteable text, not as images or links. See [how to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Jan Boyer May 04 '20 at 16:16
1 Answers
1
Using data.table, where your dataframe is called df1
first converting it to a data.table, then using the cumsum()
function by (and ordered by, because I've used keyby =
rather than by =
) country.
library(data.table)
setDT(df1)
dt1[, cases_cumulative := cumsum(cases), keyby = countriesAndTerritories]
Data.tables are based on the basic R data.frame data structure, but offer improved functionality, efficiency and speed. Data.table annotation uses DT[i,j,by]
- which very simpply is do subsetting or ordering on i
, then select or do j
grouped by
.
Alternatively using tidyverse, use group_by()
and mutate()
:
library(tidyverse)
df1 %>%
group_by(countriesAndTerritories) %>%
mutate(cum_cases = cumsum(cases))

rg255
- 4,119
- 3
- 22
- 40
-
Really appreciate the coding help. Right now cumulative cases count the case based on the order of top and bottom, but the date variable ("dateRep")'s dates are latest on top and oldest on the bottom. So the cumulative cases's were summed up on reversed based on your previous code. I would need to sum the cumulative cases based on previous date's sum + current date's new cases stacking from bottom to top. What would you suggest to account for the reversed ordering of the "dateRep" variable? Thanks! – Doug C. May 04 '20 at 11:08
-
In the data.table version you can order on `i` so add the order function like so: `dt1[order(dateRep), cases_cumulative := cumsum(cases), keyby = countriesAndTerritories]` note that it might be `order(-dateRep)` – rg255 May 04 '20 at 12:04
-
Tidyverse option would be to add `arrange(dateRep) %>%` before the group_by – rg255 May 04 '20 at 12:11