How to speed ud this Numba matrix multiplication

Question

I try to reproduce the matrix factorization using numba. Here the code:

import numpy as np
import timeit
from numba import jit, float64, prange


@jit('float64[:,:](float64[:,:],float64[:,:])', parallel=True, nopython=True)
def matmul(A, B):
    C = np.zeros((A.shape[0], B.shape[1]))
    for i in prange(A.shape[0]):
        for j in prange(B.shape[1]):
            for k in range(A.shape[0]):
                C[i,j] = C[i,j] + A[i,k]*B[k,j]
    return C



if __name__ == '__main__':
    m_size = 1000
    num_loops = 10
    A = np.random.rand(m_size, m_size)
    B = np.random.rand(m_size, m_size)

    # Numpy
    start = timeit.default_timer()
    for i in range(num_loops):
        A.dot(B)
    stop = timeit.default_timer()
    execution_time = stop - start
    print("Numpy Executed in ", execution_time)


    # Numba
    start = timeit.default_timer()
    for i in range(num_loops):
        matmul(A, B)
    stop = timeit.default_timer()
    execution_time = stop - start
    print("Numba Executed in ", execution_time)

And here is the output:

Numpy Executed in  0.713342247006949
Numba Executed in  17.631791604988393

In a related post, the performances of numba and numpy were really close. What I'm I doing wrong and how could I improve the matmul function performances ?

Why not simply calling np.dot(A,B) in Numba (Which actually is a call to Scipys BLAS backend)? Implementing a efficient matrix multiplication for larger matrices is not that simple https://gist.github.com/nadavrot/5b35d44e8ba3dd718e595e40184d03f0 — max9111, Jun 19 '19 at 14:46
My goal is to implement a different version of matrix multiplication, where instead of taking the sum of the products, I would take the minimum of the product. Moreover I would like to do this for sparse matrices. — Robin, Jun 19 '19 at 14:48
The post you are comparing your function's performance to was using an array `B` with size `(N, 3)`, which looks like it has very different performance characteristics compared to your `(N,N)` where `N` is large, and isn't able to take advantage of the algorithmic tricks that `BLAS` is using in this regime where they make a big difference. Without changing your algorithm, I don't think numba can do anything special to help you. — JoshAdel, Jun 19 '19 at 15:12

How to speed ud this Numba matrix multiplication

0 Answers0