I am experiencing substantially slower matrix multiplication in R as compared to python. This is for large matrices. For example (in python):
import numpy as np
A = np.random.rand(4112, 23050).astype('float32')
B = np.random.rand(23050, 2500).astype('float32')
%timeit np.dot(A, B)
1 loops, best of 3: 1.09 s per loop
Here is the equivalent multiplication in R (takes almost 10x longer):
A <- matrix(rnorm(4112*23050), ncol = 23050)
B <- matrix(rnorm(23050*2500), ncol = 2500)
system.time(A %*% B)
user system elapsed
72.032 1.048 9.444
How can I achieve matrix multiplication speeds in R that are comparable to what is standard with python?
What I Have Already Tried:
1) Part of the descrepancy seems to be that python supports float32 whereas R only uses numeric, which is similar to (the same as?) float64. For example, the same python commands as above except with float64 takes twice as long (but still 5x slower than R):
import numpy as np
A = np.random.rand(4112, 23050).astype('float64')
B = np.random.rand(23050, 2500).astype('float64')
%timeit np.dot(A, B)
1 loops, best of 3: 2.24 s per loop
2) I am using the openBLAS linear algebra back-end for R.
3) RcppEigen as detailed in answer to this SO (see link for test.cpp file). The multiplication is about twice as fast in "user" time, but 3x slower in the more critical elapsed time as it only uses 1 of 8 threads.
library(Rcpp)
sourceCpp("test.cpp")
A <- matrix(rnorm(4112*23050), nrow = 4112)
B <- matrix(rnorm(23050*2500), ncol = 2500)
system.time(res <- eigenMatMult(A, B))
user system elapsed
29.436 0.056 29.551