0

I am trying to reproduce on my dataframe a DiD analysis performed by Callaway and Sant'Anna (2021). Having a variation in treatment timing, I need to define a variable "first-treat" reporting for each ID the year when they first became treated (treatment = 0 if not treated, 1 otherwise). In case the units are never treated, the value of first.treat will be zero. I report below a simplified dataframe: I have the variables ID, Year, and Treatment. I need to create the variable first.treat as follows.

ID Year Treatment first.treat
a 2016 0 2017
a 2017 1 2017
a 2018 1 2017
b 2016 1 2016
b 2017 1 2016
b 2018 1 2016
c 2016 0 2018
c 2017 0 2018
c 2018 1 2018
d 2016 0 0
d 2017 0 0
d 2018 0 0

How can I do it with R? Thank you

  • 1
    Welcome to SO! Please see [How do I ask a good question?](https://stackoverflow.com/help/how-to-ask) and [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). You can provide your data via the output of `dput(df)`. – AndrewGB Jan 04 '22 at 17:24

1 Answers1

2

Please make sure to provide data in a more R-friendly format next time. E.g.

df <- 
  tibble::tribble(
    ~ID, ~Year, ~Treatment, ~first.treat,
    "a", 2016L,         0L,        2017L,
    "a", 2017L,         1L,        2017L,
    "a", 2018L,         1L,        2017L,
    "b", 2016L,         1L,        2016L,
    "b", 2017L,         1L,        2016L,
    "b", 2018L,         1L,        2016L,
    "c", 2016L,         0L,        2018L,
    "c", 2017L,         0L,        2018L,
    "c", 2018L,         1L,        2018L,
    "d", 2016L,         0L,           0L,
    "d", 2017L,         0L,           0L,
    "d", 2018L,         0L,           0L
  )

Luckily there is a datapasta package which allowed me to easily convert your table to the code above. But it might not be so widely known.

Here's a solution to your problem:

library(dplyr)
df %>% 
  group_by(ID) %>% 
  mutate(first.treat = min(
    if_else(Treatment == 1, Year, NA_integer_),
    na.rm = TRUE
  )) %>% 
  ungroup()
#> # A tibble: 12 x 4
#>    ID     Year Treatment first.treat
#>    <chr> <int>     <int>       <dbl>
#>  1 a      2016         0        2017
#>  2 a      2017         1        2017
#>  3 a      2018         1        2017
#>  4 b      2016         1        2016
#>  5 b      2017         1        2016
#>  6 b      2018         1        2016
#>  7 c      2016         0        2018
#>  8 c      2017         0        2018
#>  9 c      2018         1        2018
#> 10 d      2016         0         Inf
#> 11 d      2017         0         Inf
#> 12 d      2018         0         Inf

Created on 2022-01-04 by the reprex package (v2.0.1)

Here we calculate min values in groups by ID for a modified Year variable: when Treatment is not 1, year is set to NA.

Iaroslav Domin
  • 2,698
  • 10
  • 19