0

Novice coder here!

I am trying to sample 50 rows from a data frame with about 27,000 rows. I have tried this:

sample <- df %>%
  sample_n(50)

but I get this error message:

Error: "size" must be less or equal than 1 (size of data), set "replace" = TRUE to use sampling with replacement.

It seems like the error is that the data frame is smaller than the sample size, which is obviously not the case. How can I fix this?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
jcow11
  • 1
  • 1
    [See here](https://stackoverflow.com/q/5963269/5325862) on making a reproducible example that is easier for folks to help with. You say it's obviously not the case that the data frame is smaller than the sample size, but we don't know that for sure without access to a representative sample of data – camille May 27 '21 at 01:11

2 Answers2

0

Check dim(df) to see if it’s actually 27k rows. This isn’t tidyverse but it will work:

index <-sample.int(nrow(df),50)
sample <- df[index, ]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
zeejay
  • 11
  • 2
0

That must be because your data is grouped. Let's take mtcars as an example -

library(dplyr)
mtcars %>% group_by(cyl) %>% sample_n(30)

Error: size must be less or equal than 11 (size of data), set replace = TRUE to use sampling with replacement.

Try ungrouping your data before using sample_n -

sample <- df %>% ungroup %>% sample_n(50)

If you have new dplyr (>= 1.0.0) you can use slice_sample instead of sample_n -

sample <- df %>% ungroup %>% slice_sample(n = 50)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213