I would like to efficiently make a random sample by group from a data.table
, but it should be possible to sample a different proportion for each group.
If I wanted to sample fraction sampling_fraction
from each group, i could get inspired by this question and related answer to do something like:
DT = data.table(a = sample(1:2), b = sample(1:1000,20))
group_sampler <- function(data, group_col, sample_fraction){
# this function samples sample_fraction <0,1> from each group in the data.table
# inputs:
# data - data.table
# group_col - column(s) used to group by
# sample_fraction - a value between 0 and 1 indicating what % of each group should be sampled
data[,.SD[sample(.N, ceiling(.N*sample_fraction))],by = eval(group_col)]
}
# what % of data should be sampled
sampling_fraction = 0.5
# perform the sampling
sampled_dt <- group_sampler(DT, 'a', sampling_fraction)
But what if i wanted to sample 10% from group 1 and 50% from group 2?