I am trying to find a way to efficiently replicate rows of a matrix in R based on a group. Let's say I have the following matrix a
:
a <- matrix(
c(1, 2, 3,
4, 5, 6,
7, 8, 9),
ncol = 3, byrow = TRUE
)
I want to create a new matrix where each row in a
is repeated based on a number specified in a vector (what I'm calling a "group"), e.g.:
reps <- c(2, 3, 4)
In this case, the resulting matrix would be:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 3
[3,] 4 5 6
[4,] 4 5 6
[5,] 4 5 6
[6,] 7 8 9
[7,] 7 8 9
[8,] 7 8 9
[9,] 7 8 9
This is the only solution I've come up with so far:
matrix(
rep(a, times = rep(reps, times = 3)),
ncol = 3, byrow = FALSE
)
Notice that in this solution I have to use rep()
twice - first to replicate the reps
vector, and then again to actually replicate each row of a
.
This solution works fine, but I'm looking for a more efficient solution as in my case this is being done inside an optimization loop and is being computed in each iteration of the loop, and it's rather slow if a
is large.
I'll note that this question is very similar, but it is about repeating each row the same number of times. This question is also similarly about efficiency, but it's about replicating entire matrices.
UPDATE
Since I'm interested in efficiency, here is a simple comparison of the solutions provided thus far...I'll update this as more come in, but in general it looks like the seq_along
solution by F. Privé is the fastest.
library(dplyr)
library(tidyr)
a <- matrix(seq(9), ncol = 3, byrow = TRUE)
reps <- c(2, 3, 4)
rbenchmark::benchmark(
"original solution" = {
result <- matrix(rep(a, times = rep(reps, times = 3)),
ncol = 3, byrow = FALSE)
},
"seq_along" = {
result <- a[rep(seq_along(reps), reps), ]
},
"uncount" = {
result <- as.data.frame(a) %>%
uncount(reps)
},
replications = 1000,
columns = c("test", "replications", "elapsed", "relative")
)
test replications elapsed relative
1 original solution 1000 0.004 1.333
2 seq_along 1000 0.003 1.000
3 uncount 1000 1.722 574.000