A solution with purrr
may be the following
df.result <- map2(.x = lapply(seq_len(nrow(responses)), function(i) responses[i,]),
.y = lapply(seq_len(nrow(order)), function(i) order[i,]),
.f = ~ .x[.y])
do.call("rbind", df.result)
In this code, the .x
and .y
are lists of vectors, i.e. a list of the rows (following this post https://stackoverflow.com/a/6821395/11086911). The output of map2
is then aggregated to a matrix with do.call
and rbind
.
In case anyone is curious as to how this compares to the other solutions, here is a comparison.
library(microbenchmark)
library(purrr)
set.seed(42) # For reproducibility purposes
# Comparison with given data
order.matrix <- matrix(c("Anger", "Happy", "Sad", "Happy", "Sad","Anger", "Sad", "Anger", "Happy"),
ncol=3,
byrow=TRUE)
df.responses <- matrix(c(1, 2, 3, 3, 2, 0, 9, 2, 1),
ncol=3,
byrow=TRUE)
colnames(df.responses) <- c("Anger", "Happy", "Sad")
solForLoop <- function(order, responses) {
df.result <- responses
colnames(df.result) <- paste0("V", 1:ncol(responses))
for (i in 1:nrow(order)) {
df.result[i,] <- responses[i,order[i,]]
}
df.result
}
solmApply <- function(order, responses) {
t(mapply(FUN = function(x, y) x[y],
as.data.frame(t(responses)),
as.data.frame(t(order)),
USE.NAMES = F
))
}
solPurrr <- function(order, responses) {
df.result <- map2(.x = lapply(seq_len(nrow(responses)), function(i) responses[i,]),
.y = lapply(seq_len(nrow(order)), function(i) order[i,]),
.f = ~ .x[.y])
do.call("rbind", df.result)
}
microbenchmark::microbenchmark(
solForLoop(order.matrix, df.responses),
solmApply(order.matrix, df.responses),
solPurrr(order.matrix, df.responses),
times = 1000L,
check = "equivalent"
)
# Unit: microseconds
# expr min lq mean median uq max neval
# solForLoop(order.matrix, df.responses) 8.601 11.101 15.03331 15.9010 17.3020 62.002 1000
# solmApply(order.matrix, df.responses) 313.801 346.701 380.32261 357.7510 374.2010 14322.900 1000
# solPurrr(order.matrix, df.responses) 49.900 61.301 70.68950 70.7015 75.8015 190.700 1000
Given that the data is from a questionnaire, I will assume that every value in an order.matrix
row can occur only once. For a matrix with the same 3 columns but 100 000 rows, we find that
# Comparison for big data
order.matrix.big <- as.matrix(sample_n(as.data.frame(order.matrix), 100000, replace = TRUE))
df.responses.big <- as.matrix(sample_n(as.data.frame(df.responses), 100000, replace = TRUE))
microbenchmark::microbenchmark(
solForLoop(order.matrix.big, df.responses.big),
solmApply(order.matrix.big, df.responses.big),
solPurrr(order.matrix.big, df.responses.big),
times = 100L,
check = "equivalent"
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# solForLoop(order.matrix.big, df.responses.big) 110.2585 130.0916 163.3382 142.4249 167.7584 514.7262 100
# solmApply(order.matrix.big, df.responses.big) 4669.8815 4866.6152 5232.1814 5160.2967 5385.5000 6568.1718 100
# solPurrr(order.matrix.big, df.responses.big) 441.6195 502.0853 697.7207 669.4963 871.9122 1218.6721 100
So while map2
provides an interesting way of working for 'looping' over rows, in this case it is not as fast a simple for loop.