I need to optimize my code for my matrix multiplication program. When using @jit(nopython=True) for the same program I get 346 ms as the speed but I'm trying to get it faster by using Cuda.
I appreciate any help you can give me!
Here is the error and the code:
---> 19 @vectorize(['(float32, float32, float32)'], target='cuda')
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<built-in function len>) found for signature:
>>> len(float32)
There are 16 candidate implementations:
- Of which 16 did not match due to:
Overload of function 'len': File: <numerous>: Line N/A.
With argument(s): '(float32)':
No match.
from numba import cuda, float32, prange
matrix1 = cp.random.uniform(1,10,size=(1000,1000), dtype=np.float64)
matrix2 = cp.random.uniform(1,10, size=(1000,1000), dtype=np.float64)
rmatrix = cp.zeros(shape=(1000,1000), dtype=np.float64)
#multiplication function
@vectorize(['(float32, float32, float32)'], target='cuda')
def gpu_matrix_multiplication(matrix1,matrix2,rmatrix):
for i in prange(len(matrix1)):
for j in prange(len(matrix2[0])):
for k in prange(len(matrix2)):
rmatrix[i][j] += matrix1[i][k] * matrix2[k][j]
#Calculate running time
%timeit gpu_matrix_multiplication(matrix1,matrix2,rmatrix)```