Here is an approach using matrix multiplication, based on an example in https://slowkow.com/notes/sparse-matrix/. First, let's create a sparse matrix to play with,
library(magrittr)
library(forcats)
library(stringr)
library(Matrix)
set.seed(42)
m <- sparseMatrix(
i = sample(x = 1e4, size = 1e4),
j = sample(x = 1e4, size = 1e4),
x = rnorm(n = 1e4)
)
colnames(m) <- str_c("col", seq(ncol(m)))
rownames(m) <- str_c("row", seq(nrow(m)))
and a grouping vector defining which rows to sum,
group <- sample(1:10, nrow(m), replace = TRUE) %>%
paste0("new_row", .) %>%
fct_inorder
Whether group
is a factor and its level order will affect the final row order in the merged matrix. I made group
a factor with levels ordered by first appearance in group
to make the row order resemble that from the rowsum()
operation with reorder = FALSE
.
Next, we create a (sparse) matrix that we can left-multiply by m
to get a version of m
whose rows have been summed based on group
,
group_mat <- sparse.model.matrix(~ 0 + group) %>% t
# Adjust row names to get the correct final row names
rownames(group_mat) <- rownames(group_mat) %>% str_extract("(?<=^group).+")
msum <- group_mat %*% m
The result matches base::rowsum()
on the dense version of the matrix,
d <- as.matrix(m)
dsum <- rowsum(d, group, reorder = FALSE)
all.equal(as.matrix(msum), dsum)
#> [1] TRUE
but the sparse-matrix multiplication method is much faster,
bench::mark( msum <- group_mat %*% m )$median
#> [1] 344µs
bench::mark( dsum <- rowsum(d, group) )$median
#> [1] 146ms