I have a dataset containing an observation month, a score, and an outcome. I need to perform for each month, a logistic regression of months -16 to -13 for predicting outcome using score, and then fit the results to month 0's values. I'm able to produce a foreach loop to do it but I doubt it's the best way.
df<-data.frame(month=rep(1:94,times=20),score=abs(round(rnorm(n=94*20)*100,0)),outcome=abs(round(rnorm(n=94*20),0)))
df$outcome<-ifelse(df$outcome>1,0,df$outcome)
#logistic regression example (including scaling the results to provide the score modifier)
library("foreach")
foreach(imonth=unique(df$month)[16:length(unique(df$month))])%do%
{glmsubset<-df[df$month>=(imonth-16)&df$month<=(imonth-13),]
glmmodel<-glm(formula=outcome~score,data=glmsubset, family=binomial(link=logit))
df$modelresult[df$month==imonth]<-predict(glmmodel,newdata=df[df$month==imonth,],type="response")
}
df$scoreadjustment<-log(df$modelresult/(1-df$modelresult))*(50/log(2))
df$adjscore<-round(df$score+ifelse(is.na(df$scoreadjustment),0,df$scoreadjustment),0)
df
So for month 94, a logistic regression for subset of months 78 to 81 should be performed and the resultant model should be applied to scores in month 94 as an additional column. The additional column would be populated for every month where month >=16.
I was hoping for a less loop-like construction and people have indicated a number of SO posts: Is there a _fast_ way to run a rolling regression inside data.table? R data.table sliding window
However the first, whilst performing a form of regression does not use an offset month and is designed for returning the coefficients as opposed to directly using the results. The second performs an aggregation for a rolling median and also refers to the first SO post. The first requires strong knowledge of all the functions involved and provides little in the way of accessability for someone of a lower tier of expertise.
I am reading up on zoo
, and rollapply
from it in particular.