I have a cohort study in which participants have a date-of-birth and dates when they entered and exited the study.
I am trying to calculate the time-at-risk (i.e. duration in the study) by age, sex and year.
For example, a participant who enters the study in June 2005 age 40.8 (in decimal) and remains in the study for one year would contribute 0.2 years in 2005 age 40, 0.3 years in 2005 age 41, and 0.5 years in 2006 age 41.
The data looks like this:
N <- 1000
set.seed(50)
d <- data.frame(
sex = sample(c('m', 'f'), N, replace = T, prob = c(0.7, 0.3)),
dob = sample(seq(as.Date('1960/01/01'), as.Date('1985/01/01'), by="day"), N, replace = T),
study_entry = sample(seq(as.Date('2000/01/01'), as.Date('2010/01/01'), by="day"), N, replace = T)
)
d$study_exit <- d$study_entry + runif(N, 10, 2000)
d$age_entry <- as.numeric(d$study_entry - d$dob) / 365.25
And I am trying to create a summary table of follow-up duration that looks like this:
+--------+------+-----+--------------+
| year | sex | age | time at risk |
+--------+------+-----+--------------+
| 2000 | male | 20 | .... |
+--------+------+-----+--------------+
| 2000 | male | 21 | .... |
+--------+------+-----+--------------+
| 2000 | male | 22 | .... |
+--------+------+-----+--------------+
| etc... | ... | ... | ... |
+--------+------+-----+--------------+
How would you go about this?