I just came across the the purrr package and I think this would help me out a bit in terms of what I want to do - I just can't put it together.
I think this is going to be along post but goes over a common use case I think many others run into so hopefully this is of use to them as well.
This is what I'm aiming for:
- From one big dataset run multiple models on each of the different subgroups.
- Have these models readily available so I can examine - for coeffients, accuracy, etc.
- From this saved model list for each of the different groupings, be able to apply the corresponding model to the corresponding test-set group.
grouping_vals = c("cyl", "vs") library(purrr) library(dplyr) set.seed(1) train=mtcars noise = sample(1:5,32, replace=TRUE) test = mtcars %>% mutate( hp = hp * noise) # just so dataset isn't identical models = train %>% group_by_(grouping_vals) %>% do(linear_model1 = lm(mpg ~hp, data=.), linear_model2 = lm(mpg ~., data=.) )
- I've gotten this far but I don't know how to 'map' the corresponding models to the "test" dataset for the corresponding grouped values.
- Now I also might be trying to get the residuals from the training of the linear_model1 or linear_model2 with the training-data for the corresponding groups.
models$linear_model1[[2]]$residuals will show me the residuals for the 2nd grouping of model1. I just don't know how move say all of models$linear_model1 $residuals over to the train dataset.
My understanding is that tidyr's nest() function is doing the same thing that occurs when I create my do() create of the models.
models_with_nest = train %>%
group_by_(grouping_vals) %>%
nest() %>%
mutate( linear_model2 = purrr::map(data, ~lm(mpg~., data=.)),
linear_model1 = purrr::map(data, ~lm(mpg~ hp+disp, data=.))
)
Again just look for a way to easily be able to 'map' these residuals/training predictions to the training dataset and apply then apply the corresponding model to an unseen test dataset like the one I created above.
I hope this isn't confusing since I see a lot of promise here I just can't figure out how to put it together.
I figure this is a task that a ton of people would like to be able to do in this more 'automated' way but instead is something that people do very slowly and step by step.