1

I am working with JHU data on coronavirus infections, and I'm trying to compute new cases (and deaths) by group. Here's the code:

base <- "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-"
world.confirmed <- read.csv(paste0(base,"Confirmed.csv"), sep=',', head=T)
world.confirmed <- gather( world.confirmed, Date, Cases, X1.22.20:X3.21.20)

world.deaths <- read.csv(paste0(base,"Deaths.csv"), sep=',', head=T)
world.deaths <- gather(world.deaths, Date, Deaths, X1.22.20:X3.21.20)

world.data <- merge(world.confirmed, world.deaths, 
                 by=c("Province.State","Country.Region","Lat", "Long", "Date"))

world.data$Date <- as.Date(world.data$Date, "X%m.%d.%y")
world.data <- world.data %>% 
    group_by(Province.State,Country.Region,Date) %>%
    arrange(Province.State, Country.Region, as.Date(Date))

Following solutions to this question in SO I have tried to compute differences by group using something like this:

world.data <- world.data %>% 
   group_by(Lat,Long) %>% 
   mutate(New.Cases = Cases - lag(Cases))

That does not work, however; any other grouping does not either. Here're results on boundary between two first countries:

Value for first element of Albania

I have tried also inserting an arrange phase, and even trying to zero the first element of the group. Same problem. Any idea?

Update I'm using R 3.4.4 and dplyr_0.8.5

jjmerelo
  • 22,578
  • 8
  • 40
  • 86

1 Answers1

1

Probably, this might help :

library(dplyr)

world.data %>%
  mutate(Date = as.Date(Date, "X%m.%d.%y")) %>% 
  arrange(Country.Region, Lat, Long, Date) %>%
  group_by(Country.Region, Lat, Long) %>%
  mutate(New_Cases = Cases - lag(Cases), 
         New_deaths = Deaths - lag(Deaths)) 

We arrange the data according to Date, and find New_Cases by subtracting today's case with yesterday's case for each Country and the same for deaths.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Not really. Dates were already formatted. The only difference I see is that you're grouping and arranging using more columns; anyway, that does not work either (and I don't really see why it should, except you're sorting by date, but as in the example, sorting was already taken care of) – jjmerelo Mar 22 '20 at 13:05
  • @jjmerelo It would be good to know what do you mean by "working". Can you show what your expected output would look like? If it's difficult to share it for original data please create a small reproducible example and show output for that example. – Ronak Shah Mar 22 '20 at 13:42
  • I got the same result as above. The problem, as indicated in the comment, was the R version. – jjmerelo Mar 22 '20 at 17:14