0

I am working with a covid dataset, and I got to get a counter from the first day that the virus appeared in said country

This is an example of my data

enter image description here

And this is my desired result

enter image description here

I have been trying with this code:

data1<-data1%>% 
  arrange(country,Date) %>% 
  group_by(Country) %>% 
  mutate(Counter= Date-first(Date)+1)

But just gets me a counter from day 1, how can I get that day 1 is from the day that confirmed is 1 for the first time.

Here is the example data:

structure(list(Date = structure(c(1577836800, 1577923200, 1578009600, 
1578096000, 1578182400, 1578268800, 1578355200, 1578441600, 1577836800, 
1577923200, 1578009600, 1578096000, 1578182400, 1578268800, 1578355200, 
1578441600, 1577836800, 1577923200, 1578009600, 1578096000, 1578182400, 
1578268800, 1578355200, 1578441600), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), country = c("Afganistan", "Afganistan", "Afganistan", 
"Afganistan", "Afganistan", "Afganistan", "Afganistan", "Afganistan", 
"Colombia", "Colombia", "Colombia", "Colombia", "Colombia", "Colombia", 
"Colombia", "Colombia", "France", "France", "France", "France", 
"France", "France", "France", "France"), confirmed = c(0, 0, 
0, 0, 0, 1, 1, 2, 0, 0, 1, 1, 2, 3, 3, 3, 0, 0, 0, 0, 0, 1, 1, 
1)), row.names = c(NA, -24L), class = c("tbl_df", "tbl", "data.frame"
))
Jorge Paredes
  • 996
  • 7
  • 13

1 Answers1

1

To get the first Date within a country group where the number of confirmed cases if greater than 0, you can try Date[which(confirmed > 0)][1]. For Dates after that first confirmed date, you can calculate the counter taking the difference similar to what you had tried.

library(dplyr)

df %>%
  arrange(country, Date) %>%
  group_by(country) %>%
  mutate(first_confirmed = Date[which(confirmed > 0)][1],
         counter = ifelse(Date >= first_confirmed, Date - first_confirmed + 1, 0)) 

Output

   Date       country    confirmed first_confirmed counter
   <date>     <chr>          <dbl> <date>            <dbl>
 1 2020-01-01 Afganistan         0 2020-01-06            0
 2 2020-01-02 Afganistan         0 2020-01-06            0
 3 2020-01-03 Afganistan         0 2020-01-06            0
 4 2020-01-04 Afganistan         0 2020-01-06            0
 5 2020-01-05 Afganistan         0 2020-01-06            0
 6 2020-01-06 Afganistan         1 2020-01-06            1
 7 2020-01-07 Afganistan         1 2020-01-06            2
 8 2020-01-08 Afganistan         2 2020-01-06            3
 9 2020-01-01 Colombia           0 2020-01-03            0
10 2020-01-02 Colombia           0 2020-01-03            0
11 2020-01-03 Colombia           1 2020-01-03            1
12 2020-01-04 Colombia           1 2020-01-03            2
13 2020-01-05 Colombia           2 2020-01-03            3
14 2020-01-06 Colombia           3 2020-01-03            4
15 2020-01-07 Colombia           3 2020-01-03            5
16 2020-01-08 Colombia           3 2020-01-03            6
17 2020-01-01 France             0 2020-01-06            0
18 2020-01-02 France             0 2020-01-06            0
19 2020-01-03 France             0 2020-01-06            0
20 2020-01-04 France             0 2020-01-06            0
21 2020-01-05 France             0 2020-01-06            0
22 2020-01-06 France             1 2020-01-06            1
23 2020-01-07 France             1 2020-01-06            2
24 2020-01-08 France             1 2020-01-06            3

Data

df <- structure(list(Date = structure(c(18262, 18263, 18264, 18265, 
18266, 18267, 18268, 18269, 18262, 18263, 18264, 18265, 18266, 
18267, 18268, 18269, 18262, 18263, 18264, 18265, 18266, 18267, 
18268, 18269), class = "Date"), country = c("Afganistan", "Afganistan", 
"Afganistan", "Afganistan", "Afganistan", "Afganistan", "Afganistan", 
"Afganistan", "Colombia", "Colombia", "Colombia", "Colombia", 
"Colombia", "Colombia", "Colombia", "Colombia", "France", "France", 
"France", "France", "France", "France", "France", "France"), 
    confirmed = c(0, 0, 0, 0, 0, 1, 1, 2, 0, 0, 1, 1, 2, 3, 3, 
    3, 0, 0, 0, 0, 0, 1, 1, 1)), class = "data.frame", row.names = c(NA, 
-24L))
Ben
  • 28,684
  • 5
  • 23
  • 45