1

I have a two-column data.frame that looks a little like this:

df <- data.frame(Name = rep(paste(letters[1:12],1:12,sep = ""),1),Group = 1:3)

What I would like to do is to randomly select, for example, 2 random values (without replacement) from 'Name' and store them in a character vector. Then select two other values, and store them in another vector, and so on. The requirement is that the values sampled from 'Name' must have the same value in 'Group'.

Is there a fast way of doing this? I could manually create vectors based in a sample of n=2, then update the contents of the original df, and sample again. But I would love to see someone suggesting a more elegant version. Maybe if I store the sampled values in a list?

Thanks in advance.

Maël
  • 45,206
  • 3
  • 29
  • 67
HernanLG
  • 664
  • 3
  • 7
  • 18
  • [Take random sample by group](https://stackoverflow.com/questions/18258690/take-random-sample-by-group) – Henrik Sep 05 '22 at 14:41

3 Answers3

4

You can use slice_sample:

library(dplyr)
df %>% 
  group_by(Group) %>% 
  slice_sample(n = 2)
  Name  Group
  <chr> <int>
1 a1        1
2 j10       1
3 e5        2
4 b2        2
5 c3        3
6 l12       3

or group_map to get a list:

library(dplyr)
df %>% 
  group_by(Group) %>% 
  group_map(~ sample(.x$Name, 2))
[[1]]
[1] "d4" "a1"

[[2]]
[1] "b2" "e5"

[[3]]
[1] "c3" "i9"

or in base R:

split(df$Name, df$Group) |>
  lapply(function(x) sample(x, 2))
Maël
  • 45,206
  • 3
  • 29
  • 67
  • Thanks. The group_by works, but I get only 3 groups. How can I sample until there are no more group members? And for your base R example, I got an Error: unexpected '>' message. – HernanLG Sep 07 '22 at 08:33
4

A base R option using by + sample

> with(df,  by(Name, Group, sample, 2))
Group: 1
[1] "g7" "d4"
------------------------------------------------------------
Group: 2
[1] "b2"  "k11"
------------------------------------------------------------
Group: 3
[1] "i9" "f6"

or a more compact outcome coming from aggregate

> aggregate(. ~ Group, df, sample, 2, simplify = FALSE)
  Group    Name
1     1 j10, a1
2     2 k11, b2
3     3 l12, c3
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • Thanks. This works, but I noticed that all the code in these answers always gives me 3 groups. How can I sample until there are no more elements? – HernanLG Sep 07 '22 at 08:31
  • @HernanLG What do you mean "sample until there are no more elements"? do you have an example output for that? – ThomasIsCoding Sep 07 '22 at 12:18
  • Sorry, I just realized the original question wasn't specific enough. The current code is selecting one sample for each group, but I wanted a code that would select more than one sample per group. Specifically, sample names without replacement from each group until there are no more names in that group. However, I realize that would require a more complicated loop that updates the data.frame, creates tables, etc. So never mind this last requirement. – HernanLG Sep 07 '22 at 13:29
2

Using data.table

library(data.table)
setDT(df)[df[, sample(.I, 2), Group]$V1]
     Name Group
   <char> <int>
1:    j10     1
2:     g7     1
3:     b2     2
4:    k11     2
5:     i9     3
6:     c3     3
akrun
  • 874,273
  • 37
  • 540
  • 662