I need to compare the running time of two algorithms Alg_x, Alg_y. However, Alg_x contains many matrix multiplications, Alg_y contains many element-wise operations (e.g., summation and muliplicaton of each two numbers/vectors). Theoretically, Alg_x and Alg_y have the same running time. However, practically, Alg_x can run much faster than Alg_y, because matrix multiplications have been specially designed and optimized in Matlab.
Then, my problem is, how to close such 'code optimization' in order to fairly compare the running time and reflect the theoretical time complexity?
%%%%% X = randn(1000,2000);
Alg_x
tic;
temp = X*X';
toc
Alg_y
[d,n] = size(X);
temp = zeros(d,d);
tic;
for i =1:n
x = X(:,i);
temp = temp+x*x';
end
toc
The above two codes have the same output, while Alg_x runs much faster. Moreover, Alg_y will also run much faster after I remove x = X(:,i); temp = temp+x*x';, so I guess it is the for iteration that makes Alg_y run slow.
I do want to close and avoid such oprimizations. Below is something I extracted from Why is MATLAB so fast in matrix multiplication?
I am making some benchmarks with CUDA, C++, C#, and Java, and using MATLAB for verification and matrix generation. But when I multiply with MATLAB, 2048x2048 and even bigger matrices are almost instantly multiplied.
1024x1024 2048x2048 4096x4096
--------- --------- ---------
CUDA C (ms) 43.11 391.05 3407.99
C++ (ms) 6137.10 64369.29 551390.93
C# (ms) 10509.00 300684.00 2527250.00
Java (ms) 9149.90 92562.28 838357.94
MATLAB (ms) 75.01 423.10 3133.90
Only CUDA is competitive, but I thought that at least C++ will be somewhat close and not 60x slower.
So my question is - How is MATLAB doing it that fast?
C++ Code:
float temp = 0;
timer.start();
for(int j = 0; j < rozmer; j++)
{
for (int k = 0; k < rozmer; k++)
{
temp = 0;
for (int m = 0; m < rozmer; m++)
{
temp = temp + matice1[j][m] * matice2[m][k];
}
matice3[j][k] = temp;
}
}
timer.stop();
Edit: I also dont know what to think about the C# results. The algorithm is just the same as C++ and Java, but there's a giant jump 2048 from 1024?
Edit2: Updated MATLAB and 4096x4096 results