3

I am trying to find a way to efficiently replicate rows of a matrix in R based on a group. Let's say I have the following matrix a:

a <- matrix(
  c(1, 2, 3,
    4, 5, 6,
    7, 8, 9),
  ncol = 3, byrow = TRUE
)

I want to create a new matrix where each row in a is repeated based on a number specified in a vector (what I'm calling a "group"), e.g.:

reps <- c(2, 3, 4)

In this case, the resulting matrix would be:

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    1    2    3
[3,]    4    5    6
[4,]    4    5    6
[5,]    4    5    6
[6,]    7    8    9
[7,]    7    8    9
[8,]    7    8    9
[9,]    7    8    9

This is the only solution I've come up with so far:

matrix(
  rep(a, times = rep(reps, times = 3)), 
  ncol = 3, byrow = FALSE
)

Notice that in this solution I have to use rep() twice - first to replicate the reps vector, and then again to actually replicate each row of a.

This solution works fine, but I'm looking for a more efficient solution as in my case this is being done inside an optimization loop and is being computed in each iteration of the loop, and it's rather slow if a is large.

I'll note that this question is very similar, but it is about repeating each row the same number of times. This question is also similarly about efficiency, but it's about replicating entire matrices.

UPDATE

Since I'm interested in efficiency, here is a simple comparison of the solutions provided thus far...I'll update this as more come in, but in general it looks like the seq_along solution by F. Privé is the fastest.

library(dplyr)
library(tidyr)

a <- matrix(seq(9), ncol = 3, byrow = TRUE)
reps <- c(2, 3, 4)

rbenchmark::benchmark(
  "original solution" = {
    result <- matrix(rep(a, times = rep(reps, times = 3)),
      ncol = 3, byrow = FALSE)
  },
  "seq_along" = {
    result <- a[rep(seq_along(reps), reps), ]
  },
  "uncount" = {
    result <- as.data.frame(a) %>%
      uncount(reps)
  },
    replications = 1000,
    columns = c("test", "replications", "elapsed", "relative")
)
               test replications elapsed relative
1 original solution         1000   0.004    1.333
2         seq_along         1000   0.003    1.000
3           uncount         1000   1.722  574.000
jhelvy
  • 542
  • 4
  • 8

3 Answers3

4

Simply use a[rep(seq_along(reps), reps), ].

F. Privé
  • 11,423
  • 2
  • 27
  • 78
2

Another option with uncount

library(dplyr)
library(tidyr)
as.data.frame(a) %>%
 uncount(reps)

-ouptut

V1 V2 V3
1  1  2  3
2  1  2  3
3  4  5  6
4  4  5  6
5  4  5  6
6  7  8  9
7  7  8  9
8  7  8  9
9  7  8  9
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    learned `uncount` from your answer, cheers! – ThomasIsCoding Jul 31 '21 at 20:29
  • 1
    @ThomasIsCoding Great solution [here](https://stackoverflow.com/questions/68603750/is-there-an-r-package-to-calculate-1st-order-transition-matrix-from-a-frequency/68604852#68604852) – akrun Jul 31 '21 at 20:30
2

Another base R option (not as elegant as the answer by @F. Privé or @akrun)

> t(do.call(cbind, mapply(replicate, reps, asplit(a, 1))))
      [,1] [,2] [,3]
 [1,]    1    2    3
 [2,]    1    2    3
 [3,]    4    5    6
 [4,]    4    5    6
 [5,]    4    5    6
 [6,]    7    8    9
 [7,]    7    8    9
 [8,]    7    8    9
 [9,]    7    8    9
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81