I am working with JHU data on coronavirus infections, and I'm trying to compute new cases (and deaths) by group. Here's the code:
base <- "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-"
world.confirmed <- read.csv(paste0(base,"Confirmed.csv"), sep=',', head=T)
world.confirmed <- gather( world.confirmed, Date, Cases, X1.22.20:X3.21.20)
world.deaths <- read.csv(paste0(base,"Deaths.csv"), sep=',', head=T)
world.deaths <- gather(world.deaths, Date, Deaths, X1.22.20:X3.21.20)
world.data <- merge(world.confirmed, world.deaths,
by=c("Province.State","Country.Region","Lat", "Long", "Date"))
world.data$Date <- as.Date(world.data$Date, "X%m.%d.%y")
world.data <- world.data %>%
group_by(Province.State,Country.Region,Date) %>%
arrange(Province.State, Country.Region, as.Date(Date))
Following solutions to this question in SO I have tried to compute differences by group using something like this:
world.data <- world.data %>%
group_by(Lat,Long) %>%
mutate(New.Cases = Cases - lag(Cases))
That does not work, however; any other grouping does not either. Here're results on boundary between two first countries:
I have tried also inserting an arrange
phase, and even trying to zero the first element of the group. Same problem. Any idea?
Update I'm using R 3.4.4 and dplyr_0.8.5