Edited to show code I already have...
My job is to predict how much money a movie will make over its first 15 weeks on cable platforms. I do this by using a regression at each week during the first 14 weeks. But I need to automate the steps of calculating each regression:
Subset total data set by week (14 week's total). So 14 distinct data frames.
df.names = paste("data",1:14,sep="") for(i in 1:14){ d.frame = subset(myData,Week==i) assign(df.names[i],d.frame) }
Subset each week's data frames into training and test sets.
set.seed(101) train_idx = sample(1:nrow(data1),round(0.7 * nrow(data1)),replace=FALSE) data1_train = data1[train_idx,] data1_test = data1[-train_idx,]
Run a linear regression on the training set for each week.
Week1_Regress = lm( x ~ coef1 + coef 2 + ... + coefi, data = data1_train)
Extract the coefficients for each regression into a CSV file.
write.csv(Week1_Regress$coef,"Selected Folder")
Calculate the RMSE using the test set and extract that into a CSV.
test = predict(Week1_Regress, data1_test) rmse = function(test,obs) { sqrt(sum((obs - test)^2) / length(test)) }
I can do each step individually, but I am looking for a loop or lapply solution so that I don't have to type out 14 versions of the 5 steps.