0

I'm aware there is sample_n function in dplyr but don't know how to pick a sample with weights.

For example;

iris %>%
group_by(Species) %>%
sample_n(size = 3)

this brings 30 observations from each group.

But I want to have 30 observation at total, and want this 30 sample to be %70 of group 1, %20 of group 2 and %10 of group 3 e.g.

Thanks in advance.

Samet Sökel
  • 2,515
  • 6
  • 21
  • Where does the weight come in? – Hugh Jun 15 '22 at 09:18
  • Does this answer your question? [Sample from a data frame using group-specific sample sizes](https://stackoverflow.com/questions/66476142/sample-from-a-data-frame-using-group-specific-sample-sizes) – KoenV Jun 15 '22 at 09:18
  • Was your issue solved by the answer I sent in below? – jpenzer Jun 24 '22 at 11:26

1 Answers1

1

Borrowing from the link KoenV has posted in the comments:

library(dplyr)
library(purrr)

sample_size <- 30
groups <- c(0.7, 0.1, 0.2)
group_size <- sample_size * groups

iris %>%
  group_split(Species)%>%
  map2_dfr(group_size, ~ slice_sample(.x, n = .y))

# A tibble: 30 × 5
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
 1          4.8         3.1          1.6         0.2 setosa 
 2          4.8         3.4          1.6         0.2 setosa 
 3          5.1         3.4          1.5         0.2 setosa 
 4          4.4         3            1.3         0.2 setosa 
 5          4.6         3.4          1.4         0.3 setosa 
 6          5.5         4.2          1.4         0.2 setosa 
 7          5.5         3.5          1.3         0.2 setosa 
 8          4.9         3            1.4         0.2 setosa 
 9          5.1         3.8          1.9         0.4 setosa 
10          5.7         4.4          1.5         0.4 setosa 

# A tibble: 3 × 2
  Species        n
  <fct>      <int>
1 setosa        21
2 versicolor     3
3 virginica      6
jpenzer
  • 739
  • 2
  • 8