How to make random sample from dataframe on unique values in R

Asked Jul 16 '19 at 09:56

Active Jul 16 '19 at 10:00

Viewed 1,373 times

I have a dataframe from which I want to draw a random sample--not just any sample but one that contains exactly one randomly sampled row from each of the unique values in the column word:

set.seed(123)
df <- data.frame(
  word = sample(LETTERS[1:5], 50, replace = T),
  value = sample(1:10, 50, replace = T)
)
head(df)
  word value
1    B     1
2    D     5
3    C     8
4    E     2
5    E     6
6    A     3

What I've done to solve this problem is this: 1. Store unique words in vector:

UniqueWords <- unique(df$word)

2. Set up a for loop:

for(i in UniqueWords){
  df_sample[i,] <- df[sample(1:nrow(df[df$word==UniqueWords[i], ]), 1), ]
}

The loop, however, does not produce the correct result. How can it be tweaked or, alternatively, what other method can be used?

asked Jul 16 '19 at 09:56

Chris Ruehlemann

20,321
4
12
34

1

Do you want to get one random row for each `word` ? `df %>% group_by(word) %>% sample_n(1) ` – Ronak Shah Jul 16 '19 at 09:59
Correct. Since there are 5 different words, the desired df should contain 5 rows. In which package is `%>%`? – Chris Ruehlemann Jul 16 '19 at 10:03
1

okay...that was a `dplyr` solution. There are many more alternatives in the linked target. – Ronak Shah Jul 16 '19 at 10:05

How to make random sample from dataframe on unique values in R

0 Answers0