Sample vectors from a larger vector in R

Question

I have a two-column data.frame that looks a little like this:

df <- data.frame(Name = rep(paste(letters[1:12],1:12,sep = ""),1),Group = 1:3)

What I would like to do is to randomly select, for example, 2 random values (without replacement) from 'Name' and store them in a character vector. Then select two other values, and store them in another vector, and so on. The requirement is that the values sampled from 'Name' must have the same value in 'Group'.

Is there a fast way of doing this? I could manually create vectors based in a sample of n=2, then update the contents of the original df, and sample again. But I would love to see someone suggesting a more elegant version. Maybe if I store the sampled values in a list?

Thanks in advance.

[Take random sample by group](https://stackoverflow.com/questions/18258690/take-random-sample-by-group) — Henrik, Sep 05 '22 at 14:41

Maël · Answer 1 · 2022-09-05T13:57:32.980

4

You can use slice_sample:

library(dplyr)
df %>% 
  group_by(Group) %>% 
  slice_sample(n = 2)

  Name  Group
  <chr> <int>
1 a1        1
2 j10       1
3 e5        2
4 b2        2
5 c3        3
6 l12       3

or group_map to get a list:

library(dplyr)
df %>% 
  group_by(Group) %>% 
  group_map(~ sample(.x$Name, 2))

[[1]]
[1] "d4" "a1"

[[2]]
[1] "b2" "e5"

[[3]]
[1] "c3" "i9"

or in base R:

split(df$Name, df$Group) |>
  lapply(function(x) sample(x, 2))

edited Sep 05 '22 at 13:57

answered Sep 05 '22 at 13:51

Maël

45,206
3
29
67

Thanks. The group_by works, but I get only 3 groups. How can I sample until there are no more group members? And for your base R example, I got an Error: unexpected '>' message. – HernanLG Sep 07 '22 at 08:33

score 4 · Accepted Answer · answered Sep 05 '22 at 13:55

4

A base R option using by + sample

> with(df,  by(Name, Group, sample, 2))
Group: 1
[1] "g7" "d4"
------------------------------------------------------------
Group: 2
[1] "b2"  "k11"
------------------------------------------------------------
Group: 3
[1] "i9" "f6"

or a more compact outcome coming from aggregate

> aggregate(. ~ Group, df, sample, 2, simplify = FALSE)
  Group    Name
1     1 j10, a1
2     2 k11, b2
3     3 l12, c3

answered Sep 05 '22 at 13:55

ThomasIsCoding

96,636
9
24
81

Thanks. This works, but I noticed that all the code in these answers always gives me 3 groups. How can I sample until there are no more elements? – HernanLG Sep 07 '22 at 08:31
@HernanLG What do you mean "sample until there are no more elements"? do you have an example output for that? – ThomasIsCoding Sep 07 '22 at 12:18
Sorry, I just realized the original question wasn't specific enough. The current code is selecting one sample for each group, but I wanted a code that would select more than one sample per group. Specifically, sample names without replacement from each group until there are no more names in that group. However, I realize that would require a more complicated loop that updates the data.frame, creates tables, etc. So never mind this last requirement. – HernanLG Sep 07 '22 at 13:29

score 2 · Answer 3 · answered Sep 05 '22 at 14:37

2

Using data.table

library(data.table)
setDT(df)[df[, sample(.I, 2), Group]$V1]
     Name Group
   <char> <int>
1:    j10     1
2:     g7     1
3:     b2     2
4:    k11     2
5:     i9     3
6:     c3     3

answered Sep 05 '22 at 14:37

akrun

874,273
37
540
662

Sample vectors from a larger vector in R

3 Answers3