Resampling groups of rows in R

Question

Do you have a faster way to resample (with replacement) groups in a dataset using R?

Edit: Note that I would like to resample groups of rows, not individual rows.

toydata <- data.frame(
  group = rep(letters[1:3], each = 2),
  rep   = rep(1:2, times = 3),
  value = 1:6)

    print(toydata)

  group rep value
1     a   1     1
2     a   2     2
3     b   1     3
4     b   2     4
5     c   1     5
6     c   2     6

ngroups <- n_distinct(toydata$group)
nreps   <- nrow(toydata) / ngroups    

s <- sample(unique(toydata$group), replace = TRUE)  # resampling groups with replacement
toydata_resampled <- left_join(
  x  = data.frame(group = rep(s, each = nreps), rep = rep(1:nreps, ngroups)),
  y  = toydata,
  by = c("group", "rep"))

One expected output:

> print(toydata_resampled)
  group rep value
1     a   1     1
2     a   2     2
3     a   1     1
4     a   2     2
5     c   1     5
6     c   2     6

I am not sure if the question you asked is similar to [the post](https://stackoverflow.com/questions/8273313/sample-random-rows-in-dataframe) — Denny Chen, Feb 24 '22 at 08:39
It is similar, exept that I would like to resample groups of rows, not individual rows. — Marco, Feb 24 '22 at 09:03
Is the second output a possible expected output? Why is your code not enough? — Maël, Feb 24 '22 at 09:20
Yes it is one expected output. It works, but pretty slow (this resampling step will be repeated many many times in my pipeline). — Marco, Feb 24 '22 at 09:23

Maël · Accepted Answer · 2022-02-24T09:45:40.753

1

split your dataframe by groups, then sample the list, and return as data.frame.

set.seed(1)
do.call(rbind, sample(split(toydata, toydata$group), replace = T))

output

     group rep value
a.1      a   1     1
a.2      a   2     2
c.5      c   1     5
c.6      c   2     6
a.11     a   1     1
a.21     a   2     2

edited Feb 24 '22 at 09:45

answered Feb 24 '22 at 09:26

Maël

45,206
3
29
67

score 0 · Answer 2 · answered Feb 24 '22 at 09:46

Is this faster?

iter <- 10
uq <- unique(toydata$group)
for (i in 1:iter){
  if (i == 1){
    output <- subset(toydata, group == sample(unique(toydata$group), 1))
  } else {
    output <- rbind(output, subset(toydata, group == sample(unique(toydata$group), 1)))
  }
}

output

> output
    group rep value
 1:     a   1     1
 2:     a   2     2
 3:     b   1     3
 4:     b   2     4
 5:     c   1     5
 6:     c   2     6
 7:     c   1     5
 8:     c   2     6
 9:     a   1     1
10:     a   2     2
11:     a   1     1
12:     a   2     2
13:     c   1     5
14:     c   2     6
15:     b   1     3
16:     b   2     4
17:     c   1     5
18:     c   2     6
19:     a   1     1
20:     a   2     2
21:     c   1     5
22:     c   2     6
23:     b   1     3
24:     b   2     4
25:     a   1     1
26:     a   2     2
    group rep value

I am not sure whether it is faster than `do.call` with `rbind` or not. But I think @Mael 's answer will be faster. The answer is given for an alternative. — Denny Chen, Feb 24 '22 at 09:47

Resampling groups of rows in R

2 Answers2