I have a large dataset, and I have multiple groups I want to sample. Each group has a certain number of positive cases, with a value of 1, and a lot more negative cases, with a value of zero.
For each group, I want to select all the positive cases, and then a random amount of negative cases equal to 4x the amount of positive cases in that group.
I also need something that run quickly on a lot of data.
Semi-Update:
stratified_sample = data %>%
group_by(group) %>%
mutate(n_pos = sum(response == 1),
n_neg = 4 * n_pos) %>%
group_by(group,response) %>%
mutate(rec_num = n(),
random_val = runif(n()),
random_order = rank(random_val)) %>%
filter(response == 1 | random_order <= n_neg)