I have a large number of 2x2 matrices that I need multiplied together.
I can initiate the general shape of the problem as
import numpy as np
import time
A_dim = 6*6
B_dim = 2**8
C_dim = B_dim
A = np.random.rand(A_dim,A_dim,2,2)
B = np.random.rand(B_dim,2,2)
C = np.random.rand(C_dim,2,2)
tic = time.perf_counter()
X = A[None,None,:,:,:,:] @ B[:,None,None,None,:,:] @ A[None,None,:,:,:,:] @ C[None,:,None,None,:,:]
toc = time.perf_counter()
print(f"matrix multiplication took {toc - tic:0.4f} seconds")
where I get
matrix multiplication took 14.4403 seconds
This seems to be a vectorized implementation, but is there anything I can do to speed this up? My native Numpy library runs this on only one core, so possibly if I figured out how to get OpenBLAS to work, this would be faster? The problem is that each matrix operation takes a very small amount of time to complete. Is there a better way to construct such a multidimensional array?