If your matrix is large, then it makes sense to compute counts rowwise or columnwise to conserve memory. apply
is a valid way to go about this.
Conceptually, this answer is not unlike the one I provided here for data frames. I will once again recommend that you use tabulate
instead of table
; it is really much more efficient.
set.seed(1L)
m <- 5L
n <- 4L
A <- matrix(sample(c("0", "1", "2"), size = m * n, replace = TRUE), m, n)
A
[,1] [,2] [,3] [,4]
[1,] "0" "2" "2" "1"
[2,] "2" "2" "0" "1"
[3,] "0" "1" "0" "1"
[4,] "1" "1" "0" "2"
[5,] "0" "2" "1" "0"
f <- function(x, levels) tabulate(factor(x, levels), length(levels))
rowSums(apply(A, 1L, f, c("0", "1", "2"))) # if 'm' has more columns than rows
## [1] 7 7 6
rowSums(apply(A, 2L, f, c("0", "1", "2"))) # if 'm' has more rows than columns
## [1] 7 7 6
You are going to want apply
to loop over the smaller dimension of your matrix, so choose the second argument accordingly. If your matrix actually has millions of rows and only 18 columns, then use the second statement above, not the first.
Here is a test using a matrix with your dimensions. It only takes ~10 seconds on my machine, so parallelization might be overkill.
set.seed(1L)
m <- 3e+07L
n <- 18L
A <- matrix(sample(c("0", "1", "2"), m * n, replace = TRUE), m, n)
system.time(rowSums(apply(A, 2L, f, c("0", "1", "2"))))
## user system elapsed
## 8.195 2.816 12.322
Just for fun:
library("parallel")
system.time(Reduce(`+`, mclapply(seq_len(n), function(i) f(A[, i], c("0", "1", "2")), mc.cores = 4L)))
## user system elapsed
## 3.924 0.904 3.497