Currently, I'm in the process of optimizing a piece of code I have. I'm trying to speed-up a bunch of matrix multiplications by reducing the size of matrix dimensions. However, in some cases, using both NumPy and MATLAB, I'm not able to obtain the speed-ups I expect. Using MATLAB, I first defined 2 randomized matrices: bigger_mat
which is 10000x10000 and smaller_mat
which is 10000x100. I then created 2 smaller matrices by slicing bigger_mat
and smaller_mat
, such that I get matrix dimensions of 200x10000 (bigger_mat_sliced
) and 10000x2 (smaller_mat_sliced
).
% Defining full (big) array dimension
dim_big = 100;
% Defining sliced (small) array dimension.
dim_small = 2;
% Creating a 10000x10000 randomized array
bigger_mat = rand(dim_big^2, dim_big^2);
% Creating a 10000x100 randomized array
smaller_mat = rand(dim_big^2, dim_big);
% Slicing bigger_mat to obtain a 200x10000 array
bigger_mat_sliced = bigger_mat(1:dim_small * dim_big, :);
% Slicing smaller_mat to obtain a 10000x2 array
smaller_mat_sliced = smaller_mat(:, 1:dim_small);
I then measured the runtimes for the following 3 matrix multiplications:
bigger_mat
xsmaller_mat
bigger_mat_sliced
xsmaller_mat
bigger_mat
xsmaller_mat_sliced
My expectations were as the following:
Multiplication #1 should take the longest amount of time since the unsliced (full) matrices are being multiplied
Multiplications #2 and #3 should take less time than #1, as in both #2 and #3 I'm multiplying a full matrix with a sliced matrix. Specifically, #2 and #3 both should require the same amount of time, and both should be 50 times faster than #1 (the sliced dimensions are scaled down by a factor of
dim_big/dim_small = 100/2 = 50
).
The timings I got were:
bigger_mat
xsmaller_mat
: Elapsed time is 0.110538 secondsbigger_mat_sliced
xsmaller_mat
: Elapsed time is 0.002564 secondsbigger_mat
xsmaller_mat_sliced
: Elapsed time is 0.068878 seconds
While #2 is behaving as expected with a 43x speed-up compared to #1, #3 is only 1.6x faster than #1. I tried running this test using NumPy, but I also got similar timings to those above.
It seems to me like when multiplying 2 matrices of unequal outer dimensions, for example A(i,k)*B(k,j)
where i
>> j
, slicing the largest of the 2 outer dimensions (i
) by some factor scales down the multiplication time as expected. However, for some reason, scaling down (or slicing) the smaller dimension (j
) yields barely any speed-up. I'm really having a hard time understanding these results. I tried looking up matrix multiplication algorithms implemented in BLAS libraries, hoping to find an explanation, but soon I found myself out of my depth.
Lastly, is there a way to make multiplication #2 as fast as #3? Thanks!