I am writing a function to calculate the duration of overlap between three periods, but I am having trouble in finding out how to efficiently program this, so hopefully someone can help me out.
I have a dataset of people who have been followed over time. The starting date, and also the time spent in the study differs between the participants. For each participant, I would like to calculate how many days they were in the study in a specific year and in which 5-year age category that was. For example, if someone was in the study from 01-01-2000 to 01-06-2001, and he was born on 15-06-1965, he would contribute 166 days to the 30-34 year age category in 2000, 200 days in the 35-39 year age category in 2000 and 151 days in to the 35-39 year age category in 2001, while he spent 0 days in all other categories.
In other words: I would like to quantify the overlap between these periods:
A = entering study to ending study (varies among participants, but fixed value within participant)
B = begin specific year to end specific year (same among participants, varies within participant)
C = entering specific 5-yr age category to exiting specific 5-yr age category (varies among participants, varies within participant)
My data looks something like this:
dat <- data.frame(lapply(
data.frame("Birth"=c("1965-06-15","1960-02-01","1952-05-02"),
"Begin"=c("2000-01-01","2003-08-14","2007-12-05"),
"End"=c("2001-06-01","2006-10-24","2012-03-01")),as.Date))
Thus far, I came up with this, but now do not know how to proceed (or whether I should take a totally different approach)…
spec.fu <- function(years,birth,begin,end,age.cat,data){
birth <- data[,birth]
start.A <- data[,begin]
end.A <- data[,end]
for (i in years){
start.B <- as.Date(paste(i,"01-01",sep="-"))
end.B <- as.Date(paste(i+1,"01-01",sep="-"))
for (j in age.cat){
start.C <- paste((as.numeric(format(birth, "%Y"))+j),
format(birth,"%m-%d"), sep="-")
end.C <- paste((as.numeric(format(birth, "%Y"))+j+5),
format(birth,"%m-%d"), sep="-")
result <- ?????
data[,ncol(data)+?????] <- result
colnames(data)[ncol(data)+?????] <- paste("fu",j,"in",i,sep="")
}
}
return(data)
}
And use it like this:
newdata <- spec.fu(years=2000:2001,birth="Birth",begin="Begin",
end="End",age.cat=seq(30,35,5),data=dat)
So, in this case, I want to make 2 (no. of age categories) * 2 (no. of years) = 4 new columns for each participant, each containing the no. of days that someone has spent in the study in that specific category (e.g. in age category 30-34 in 2001).
Hopefully I was able to clearly explain my problem.
Many thanks in advance!