Here is an updated dplyr
version for stratified sampling when you need different numbers of samples from each group (i.e. 1:5 ratio or something in my case, but you can specify the n for each group combination).
set.seed(1)
n <- 1e4
d <- tibble::tibble(age = sample(1:5, n, T),
lc = rbinom(n, 1 , .5),
ants = rbinom(n, 1, .7))
> d
# A tibble: 10,000 x 3
age lc ants
<int> <int> <int>
1 2 0 1
2 2 1 1
3 3 1 1
4 5 0 1
5 2 0 1
6 5 0 1
7 5 1 1
8 4 1 1
9 4 1 1
10 1 0 1
# … with 9,990 more rows
there are 10 unique combos of age/lc:
> d %>% group_by(age, lc) %>% nest()
# A tibble: 10 x 3
# Groups: age, lc [10]
age lc data
<int> <int> <list>
1 2 0 <tibble [993 × 1]>
2 2 1 <tibble [1,026 × 1]>
3 3 1 <tibble [982 × 1]>
4 5 0 <tibble [1,012 × 1]>
5 5 1 <tibble [1,056 × 1]>
6 4 1 <tibble [940 × 1]>
7 1 0 <tibble [1,010 × 1]>
8 1 1 <tibble [1,002 × 1]>
9 4 0 <tibble [958 × 1]>
10 3 0 <tibble [1,021 × 1]>
We can sample a prespecified number of rows from each group of age/lc combinations:
> d %>%
group_by(age, lc) %>%
nest() %>%
ungroup() %>%
# you must supply `n` for each combination of groups in `group_by(age, lc)`
mutate(n = c(1, 1, 1, 2, 3, 1, 2, 3, 1, 1)) %>%
mutate(samp = purrr::map2(.x = data, .y= n,
.f = function(.x, .y) slice_sample(.data = .x, n = .y))) %>%
select(-data, -n) %>%
unnest(samp)
# A tibble: 16 x 3
age lc ants
<int> <int> <int>
1 2 0 0
2 2 1 1
3 3 1 1
4 5 0 0
5 5 0 1
6 5 1 1
7 5 1 1
8 5 1 1
9 4 1 1
10 1 0 1
11 1 0 1
12 1 1 1
13 1 1 1
14 1 1 0
15 4 0 1
16 3 0 1