I am sorry if this is a silly question. I am looking to optimize my code, however, I am a newbie in R, so I do not know where to start.
I have a matrix X
, whose rows are labeled by elements of y
. Set of labels is numeric and consists of {1,...,K}
. I want to be able to compute column sum for each submatrix corresponding to different labels and store it in M
. To make this more clear, I am providing my current code:
for (i in 1:K) {
cluster = (y == i)
if (any(cluster)) {
clusterRows = X[cluster, , drop = F]
M[i, ] = colSums(clusterRows)
}
}
Is there a better, more efficient way to do this? By efficient, I mean the running time.
EDIT: Example.
Input:
set.seed(1)
X = matrix(rnorm(100*2), nrow = 100, ncol = 2)
y = rep(1:2, 50)
M = matrix(rep(0,4), 2)
K = 2
Output:
[,1] [,2]
[1,] 9.776280 -2.595435
[2,] 1.112457 -1.185373
EDIT 2: I am not using any libraries besides base
.
Here is my sessionInfo()
:
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 19.3
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] microbenchmark_1.4-7 compiler_3.4.4 tools_3.4.4