So while evaluating possibilities to speed up Python code i came across this Stack Overflow post: Comparing Python, Numpy, Numba and C++ for matrix multiplication
I was quite impressed with numba's performance and implemented some of our function in numba. Unfortunately the speedup was only there for very small matrices and for large matrices the code became very slow compared to the previous scipy sparse implementation. I thought this made sense but nevertheless i repeated the test in the original post (code below).
When using a 1000 x 1000 matrix, according to that post even the python implementation should take roughly 0,01 s. Here's my results though:
python : 769.6387 seconds
numpy : 0.0660 seconds
numba : 3.0779 seconds
scipy : 0.0030 seconds
What am i doing wrong to get such different results than the original post? I copied the functions and did not change anything. I tried both Python 3.5.1 (64 bit) and Python 2.7.10 (32 bit), a colleague tried the same code with the same results. This is the result for a 100x100 matrix:
python : 0.6916 seconds
numpy : 0.0035 seconds
numba : 0.0015 seconds
scipy : 0.0035 seconds
Did i make some obvious mistakes?
import numpy as np
import numba as nb
import scipy.sparse
import time
class benchmark(object):
def __init__(self, name):
self.name = name
def __enter__(self):
self.start = time.time()
def __exit__(self, ty, val, tb):
end = time.time()
print("%s : %0.4f seconds" % (self.name, end-self.start))
return False
def dot_py(A, B):
m, n = A.shape
p = B.shape[1]
C = np.zeros((m, p))
for i in range(0, m):
for j in range(0, p):
for k in range(0, n):
C[i, j] += A[i, k] * B[k, j]
return C
def dot_np(A, B):
C = np.dot(A,B)
return C
def dot_scipy(A, B):
C = A * B
return C
dot_nb = nb.jit(nb.float64[:,:](nb.float64[:,:], nb.float64[:,:]), nopython=True)(dot_py)
dim_x = 1000
dim_y = 1000
a = scipy.sparse.rand(dim_x, dim_y, density=0.01)
b = scipy.sparse.rand(dim_x, dim_y, density=0.01)
a_full = a.toarray()
b_full = b.toarray()
print("starting test")
with benchmark("python"):
dot_py(a_full, b_full)
with benchmark("numpy"):
dot_np(a_full, b_full)
with benchmark("numba"):
dot_nb(a_full, b_full)
with benchmark("scipy"):
dot_scipy(a, b)
print("finishing test")
edit:
for anyone seeing this at a later time. this is the results i got when using sparse nxn matrices (1% of elements are nonzero).