0

What method underlies numpy's matrix multiplication? My understanding is that it uses BLAS, but I get very different runtimes when I perform a matrix multiplication in a Jupyter notebook vs a direct call to blas::gemm in c++. For example, this code:

import numpy
import time

N = 1000
mat = numpy.random.rand(N, N)
start = time.perf_counter()
m = mat@mat
print(time.perf_counter()-start)

Gives a runtime of about 0.25 sec on my laptop. This code:

    float *A_ = new float[n*n], *C_ = new float[n*n];

    for (long i = 0; i < n*n; i++)
        A_[i] = 1.1;  

    const char tran = 'N';  float alpha = 1, beta = 0;
    st = clock();  
    blas::gemm(&tran, &tran, &n, &n, &n, &alpha, A_, &n, A_, &n, &beta, C_, &n);
    fn = clock();
    cout << float(fn-st)/float(CLOCKS_PER_SEC) << endl;

takes about 0.85 sec to run. So, I'm guessing numpy is using something else besides blas::gemm. Does anyone know what it's using under the hood? Thanks in advance.

DJM123
  • 91
  • 3
  • Are you using the same BLAS library? – norok2 Dec 08 '19 at 15:12
  • @norok2: It's the version that came with the latest version of Armadillo, looks to have been built in 2013. – DJM123 Dec 08 '19 at 21:58
  • I am not sure how to reply, but basically different BLAS libraries (Atlas, OpenBLAS, MKL, etc.) will have different performances. Unless you are using the same BLAS library and version, there is nothing to be surprised about your observations. – norok2 Dec 08 '19 at 22:43
  • @norok2: Thanks. Is there any way to find out what version numpy is using? – DJM123 Dec 08 '19 at 22:44
  • https://stackoverflow.com/questions/9000164/how-to-check-blas-lapack-linkage-in-numpy-and-scipy – norok2 Dec 08 '19 at 22:46

0 Answers0