How do you calculate the mean of the outcome, grouped by individual, according to the time of event?

Question

I'm doing an event-study project, and I want to calculate the average outcome.

Suppose we have 3 individuals. The event occurs to individual 1 in 2019, to individual 2 in 2020, and to individual 3 in 2017.

The outcome of individual 1 in 2019 is 1, and the outcome of individual 2 in 2019 is 0. The event predates the survey year for individual 3, therefore we exclude individual 3 when calculating the average. The average probability should be 0.5 in this case.

I wonder how can you do this in R?

Thank you!

Here's the artificial data:

ID<-c(1,1,1,1,2,2,2,3,3)
year<-c(2018,2019,2020,2021,2019,2020,2021,2018,2020)
outcome<-c(1,1,0,0,1,0,0,0,1)
event_year<-c(2019,2019,2019,2019,2020,2020,2020,2017,2017)
df<-as.data.frame(cbind(ID,year,outcome,event_year))
df

> df
  ID year outcome event_year
1  1 2018       1       2019
2  1 2019       1       2019
3  1 2020       0       2019
4  1 2021       0       2019
5  2 2019       1       2020
6  2 2020       0       2020
7  2 2021       0       2020
8  3 2018       0       2017
9  3 2020       1       2017

If the event happened in year 2019 for ID = 1, why is there a 1 in 2018 — Chamkrai, Jun 19 '22 at 11:42
@TomHoel the outcome is the "event"'s outcome. I want to know the effect of the event on the outcome of another event. — Ludwig Gershwin, Jun 19 '22 at 11:53
Does this answer your question? [Aggregate / summarize multiple variables per group (e.g. sum, mean)](https://stackoverflow.com/questions/9723208/aggregate-summarize-multiple-variables-per-group-e-g-sum-mean) — user438383, Aug 03 '22 at 06:59

score 0 · Answer 1 · answered Jun 19 '22 at 12:53

If I understand your question correctly, you should group_by year like this:

library(dplyr)
df %>%
  group_by(year) %>%
  summarise(mean = mean(outcome))

Output:

# A tibble: 4 × 2
   year  mean
  <dbl> <dbl>
1  2018 0.5  
2  2019 1    
3  2020 0.333
4  2021 0

score 0 · Answer 2 · answered Jun 19 '22 at 15:29

0

In base R, we may do

aggregate(outcome ~ year, df, mean)

answered Jun 19 '22 at 15:29

akrun

874,273
37
540
662

How do you calculate the mean of the outcome, grouped by individual, according to the time of event?

2 Answers2