0

The code is like this:

for(int i = 0; i < loop_count; i++)
   cblas_sgemm(<paras group A>);

When the matrix is not very large, the fork-join cost is very obvious, especially when this is run on MIC. Besides, separate the mission by hand will cause some problem on MIC as MKL Performance on Intel Phi shows.

  //separate the left and result matrix by hand.
  //not a wise solution on MIC
  #pragma omp parallel
  for(int i = 0; i < loop_count; i++)
    cblas_sgemm(<paras group B>);

If there is a technique that I can use code:

  #pragma omp parallel
  for(int i = 0; i < loop_count; i++)
    cblas_sgemm(<paras group A>);

where cblas_sgemm uses the threads forked out of the for loop since MKL also uses OpenMP to create threads.

Sincerely, FatRabb1t.

  • MKL parallel calls have `#pragma omp parallel` internally. So your first code segment will be running in parallel already. Your other calls don't make any sense because you are no longer spreading out the work among the threads. Perhaps you meant `#pragma omp parallel for`? – NoseKnowsAll Mar 08 '16 at 22:07
  • I am sorry for asking such a question that is confusing. I want to reduce the count of fork-join operation. But it seems impossible. Thanks, NoseKnowsAll. – SYSU FatRabb1t Mar 09 '16 at 15:02

1 Answers1

0

You could do that by linking the sequential version of MKL, so that cblas_sgemm will not fork multiple threads to calculate the matrix.

On ther other hand you could use OpenMP parallel for to speed up your code.

#pragma omp parallel for
for(int i = 0; i < loop_count; i++)
  cblas_sgemm(<paras group B>);

By this way, you fork-join the threads only once instead of loop_count times.

If you are using Intel compiler icc/icpc, you could link the sequential MKL with the compiler option -mkl=sequential instead of -mkl.

If you are using other compilers such as gcc, you could use MKL link line advisor to help you generate the desired link line options. https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor

kangshiyin
  • 9,681
  • 1
  • 17
  • 29