I would like to build separate models for the different segments of my data. I have built the models like so:
log1 <- glm(y ~ ., family = "binomial", data = train, subset = x1==0)
log2 <- glm(y ~ ., family = "binomial", data = train, subset = x1==1 & x2<10)
log3 <- glm(y ~ ., family = "binomial", data = train, subset = x1==1 & x2>=10)
If I run the predictions on the training data, R remembers the subsets and the prediction vectors are with the length of the respective subset.
However, if I run the predictions on the testing data, the prediction vectors are with the length of the whole dataset, not that of the subsets.
My question is whether there is a simpler way to achieve what I would by first subsetting the testing data, then running the predictions on each dataset, concatenating the predictions, rbinding the subset data, and appending the concatenated predictions like this:
T1 <- subset(Test, x1==0)
T2 <- subset(Test, x1==1 & x2<10)
T3 <- subset(Test, x1==1 & x2>=10)
log1pred <- predict(log1, newdata = T1, type = "response")
log2pred <- predict(log2, newdata = T2, type = "response")
log3pred <- predict(log3, newdata = T3, type = "response")
allpred <- c(log1pred, log2pred, log3pred)
TAll <- rbind(T1, T2, T3)
TAll$allpred <- as.data.frame(allpred)
I'd like to think I am being stupid and there is an easier way to accomplish this - many models on small subsets of the data. How to combine them to get the predictions on the full testing data?