1

I'm writing finite difference method program.
I'm using Intel Math Kernel Library.

For example, 1000x1000 matrix A and 1000x1000 matrix B.
In Intel MKL, A*B using cblas_dgemm() function took about 600 ms.
In MATLAB, A*B took about 800 ms.
I thought MKL is very fast.

However, 1000x1000 matrix A and 1000x1 vector B,
and A\B (mldivide) in MATLAB, it took 40 ms,
but in MKL, using LAPACKE_dgesv, it took 400 ms!

so my question is,
Why in mldivide, MATLAB is so fast and MKL is so slow?

matrix A and vector B is whole filled with random values.

I'm using
MATLAB R2012b
Visual Studio 2015
Intel Parallel Studio XE 2016 Update 3 Cluster Edition

Thank you.

EDITED

first, C++ code.

#include "mkl.h"
#include "time.h"

int n = 1000;
double *a = (double *)malloc(sizeof(double) * n * n);
for(int i = 0;i < n * n;i++) a[i] = rand();
double *b = (double *)malloc(sizeof(double) * n);
for(int i = 0;i < n;i++) b[i] = rand();
int *ipiv = (int *)malloc(sizeof(int) * n);
time_t now = clock();
int info = LAPACKE_dgesv(LAPACK_ROW_MAJOR,n,1,a,n,ipiv,b,1);
time_t ms = clock() - now;
printf("%d ms",ms);

Second, MATLAB code.

n = 1000;
a = rand(n,n);
b = rand(n,1);
tic;
c = a\b;
toc * 1000

I think I didn't mistake in measuring time.
Thank you.

  • @kangshiyin Right, I misread. Thanks. – rayryeng Jul 08 '16 at 17:15
  • Can you show some timing code ? There are many things that can go wrong during timing. – kangshiyin Jul 08 '16 at 17:21
  • @kangshiyin I edited question. Thank you. –  Jul 08 '16 at 17:41
  • 1
    @ShuS `tic` and `toc` require warmup time. I suggest you use `timeit` instead. Do something like `t = timeit(@() a \ b);` assuming you have declared `a` and `b` before hand. Using `timeit` will take a bit of time to test because it performs the operation you want to time multiple times and finds the average. This function was introduced in R2013b though, but you can find it on MATLAB FEX: http://www.mathworks.com/matlabcentral/fileexchange/18798-timeit-benchmarking-function. In fact, this function was so popular that it is the reason why it is available as of R2013b. – rayryeng Jul 08 '16 at 17:46
  • @rayryeng Thank you. I understand tictoc require warmup time. If so, mldivide (backslash) will be more fast , and I can't believe more and more. Matrix multiplication is 1.3 times faster in MKL, but mldivide is 10 times slower !! –  Jul 08 '16 at 17:59
  • These two links may also provide some insights: http://stackoverflow.com/questions/12831889/efficient-way-to-solve-for-x-in-ax-b-in-matlab-when-both-a-and-b-are-big-matrice, http://stackoverflow.com/questions/6058139/why-is-matlab-so-fast-in-matrix-multiplication. The second link comments on matrix multiplication instead of solving a linear system of equations, but it may provide insight as to the timing issues that you're experiencing. – rayryeng Jul 08 '16 at 18:04
  • 1
    You comparison is not fair because MATLAB uses column major matrix. – kangshiyin Jul 08 '16 at 18:07
  • Another thing is MKL lapack has a lot of different APIs that can solve your equation with different mathematical methods. I'm not sure which one is used by MATLAB. But at least for matrix multiplication, MATLAB uses MKL gemm. – kangshiyin Jul 08 '16 at 18:56
  • 1
    @kangshiyin For the example code given, Matlab is likely to use an [LU decomposition](http://www.mathworks.com/help/matlab/ref/mldivide.html#zmw57dd0e524568) like `dgesv` does since `rand` is unlikely to generate a matrix with any special structure. – TroyHaskin Jul 08 '16 at 21:28
  • @TroyHaskin Great diagram. Several MKL APIs can solve equation with LU decomp, such as `degsv` `dgesvx` `dgesvxx`, as well as the two-step API `dgetrf` + `dgetrs`. Not sure which one is used by MATLAB, if it uses MKL for this. – kangshiyin Jul 09 '16 at 06:59

1 Answers1

2

You have a problem on timing. Both MKL and MATLAB show warm-up effect on this operation. If you time MKL twice in your C code, you will find the second is much faster as

247.349940 ms
14.353588 ms

For Matlab, it is

ans =

  825.5090

ans =

   21.7870

Please note the results also highly depends on your multi-thread settings.

The difference is that unless you quit Matlab and start it again, you won't see the warm-up effect again, as Matlab holds the resources created in the warm-up stage until you quit. If you run the Matlab script again, the resources will be reused. But for MKL, if you time only once, you always time the solving with warm-up phase, quit the c program and release the resources without reusing them.

Warm-up is not a Matlab-specific phenomenon. The resources could be pre-allocated reusable memory buffers for solving the equations in MKL's LAPACK, which affects both the C program and Matlab script.

kangshiyin
  • 9,681
  • 1
  • 17
  • 29
  • 1
    Thank you. I changed DGESV argument to LAPACK_COL_MAJOR, and I masured 100 times in MATLAB and MKL(warmup time is included in masurement of first one, and the other 99 maybe does not include warmup time). The result is, MKL is 1.76 times faster !! Thank you –  Jul 11 '16 at 04:25