Python Matrix Multiplication vs BLAS

Question

What method underlies numpy's matrix multiplication? My understanding is that it uses BLAS, but I get very different runtimes when I perform a matrix multiplication in a Jupyter notebook vs a direct call to blas::gemm in c++. For example, this code:

import numpy
import time

N = 1000
mat = numpy.random.rand(N, N)
start = time.perf_counter()
m = mat@mat
print(time.perf_counter()-start)

Gives a runtime of about 0.25 sec on my laptop. This code:

    float *A_ = new float[n*n], *C_ = new float[n*n];

    for (long i = 0; i < n*n; i++)
        A_[i] = 1.1;  

    const char tran = 'N';  float alpha = 1, beta = 0;
    st = clock();  
    blas::gemm(&tran, &tran, &n, &n, &n, &alpha, A_, &n, A_, &n, &beta, C_, &n);
    fn = clock();
    cout << float(fn-st)/float(CLOCKS_PER_SEC) << endl;

takes about 0.85 sec to run. So, I'm guessing numpy is using something else besides blas::gemm. Does anyone know what it's using under the hood? Thanks in advance.

@norok2: It's the version that came with the latest version of Armadillo, looks to have been built in 2013. — DJM123, Dec 08 '19 at 21:58
I am not sure how to reply, but basically different BLAS libraries (Atlas, OpenBLAS, MKL, etc.) will have different performances. Unless you are using the same BLAS library and version, there is nothing to be surprised about your observations. — norok2, Dec 08 '19 at 22:43
@norok2: Thanks. Is there any way to find out what version numpy is using? — DJM123, Dec 08 '19 at 22:44
https://stackoverflow.com/questions/9000164/how-to-check-blas-lapack-linkage-in-numpy-and-scipy — norok2, Dec 08 '19 at 22:46

Python Matrix Multiplication vs BLAS

0 Answers0