This is fairly complex, with the grouping and the bootstrapping, so I would probably approach it like this, using map()
two layers deep:
library(tidyverse)
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
year <- rep(2014:2018, length.out=10000)
group <- sample(c(0,1,2,3,4,5,6), replace=TRUE, size=10000)
value <- sample(10000, replace=T)
female <- sample(c(0,1), replace=TRUE, size=10000)
smoker <- sample(c(0,1), replace=TRUE, size=10000)
dta <- tibble(year=year, group=group, value=value, female=female, smoker=smoker)
glm_boot_mods <-
dta %>%
nest(data = c(-year, -group)) %>%
mutate(boots = map(
data,
~ bootstraps(., times = 20) %>%
mutate(model = map(.$splits, ~ glm(smoker ~ female, data = analysis(.x),
family = binomial(link = "probit"))),
preds = map2(model, .$splits, ~predict(.x, newdata = assessment(.y))))
))
glm_boot_mods
#> # A tibble: 35 × 4
#> year group data boots
#> <int> <dbl> <list> <list>
#> 1 2014 1 <tibble [288 × 3]> <bootstraps [20 × 4]>
#> 2 2015 4 <tibble [273 × 3]> <bootstraps [20 × 4]>
#> 3 2016 3 <tibble [301 × 3]> <bootstraps [20 × 4]>
#> 4 2017 2 <tibble [282 × 3]> <bootstraps [20 × 4]>
#> 5 2018 0 <tibble [276 × 3]> <bootstraps [20 × 4]>
#> 6 2014 3 <tibble [279 × 3]> <bootstraps [20 × 4]>
#> 7 2016 2 <tibble [314 × 3]> <bootstraps [20 × 4]>
#> 8 2018 1 <tibble [296 × 3]> <bootstraps [20 × 4]>
#> 9 2014 0 <tibble [304 × 3]> <bootstraps [20 × 4]>
#> 10 2015 6 <tibble [288 × 3]> <bootstraps [20 × 4]>
#> # … with 25 more rows
The first map()
creates the bootstrap resamples for each grouping, and then we go one layer deeper and for each resample fit a model and predict for the heldout observations for that resample. You can see what that looks like inside here for the first group:
glm_boot_mods %>%
head(1) %>%
pull(boots)
#> [[1]]
#> # Bootstrap sampling
#> # A tibble: 20 × 4
#> splits id model preds
#> <list> <chr> <list> <list>
#> 1 <split [288/111]> Bootstrap01 <glm> <dbl [111]>
#> 2 <split [288/93]> Bootstrap02 <glm> <dbl [93]>
#> 3 <split [288/103]> Bootstrap03 <glm> <dbl [103]>
#> 4 <split [288/106]> Bootstrap04 <glm> <dbl [106]>
#> 5 <split [288/109]> Bootstrap05 <glm> <dbl [109]>
#> 6 <split [288/109]> Bootstrap06 <glm> <dbl [109]>
#> 7 <split [288/92]> Bootstrap07 <glm> <dbl [92]>
#> 8 <split [288/111]> Bootstrap08 <glm> <dbl [111]>
#> 9 <split [288/99]> Bootstrap09 <glm> <dbl [99]>
#> 10 <split [288/111]> Bootstrap10 <glm> <dbl [111]>
#> 11 <split [288/102]> Bootstrap11 <glm> <dbl [102]>
#> 12 <split [288/104]> Bootstrap12 <glm> <dbl [104]>
#> 13 <split [288/115]> Bootstrap13 <glm> <dbl [115]>
#> 14 <split [288/111]> Bootstrap14 <glm> <dbl [111]>
#> 15 <split [288/108]> Bootstrap15 <glm> <dbl [108]>
#> 16 <split [288/110]> Bootstrap16 <glm> <dbl [110]>
#> 17 <split [288/110]> Bootstrap17 <glm> <dbl [110]>
#> 18 <split [288/111]> Bootstrap18 <glm> <dbl [111]>
#> 19 <split [288/103]> Bootstrap19 <glm> <dbl [103]>
#> 20 <split [288/109]> Bootstrap20 <glm> <dbl [109]>
Created on 2021-11-02 by the reprex package (v2.0.1)
Notice that there are predictions for the heldout observations for each resample. Depending on what you want to do, you can use unnest()
on the columns of glm_boot_mods
you need to handle next.