3

In this link from netlib it specifies M as:

On entry, M specifies the number of rows of the matrix op( A ) and of the matrix C. M must be at least zero. Unchanged on exit.

So if I want to use a 3x10 matrix as A but I want to use it's conjugate for zgemm (TRANSA = 'C') what should I enter as M? 3 or 10?

Also when I used other LAPACK routines I entered 2D matrices as 1D like A[3*3] instead of A[3][3] and upon calling the routine I just used A for the matrix, Can I do the same with non-square matrices? A[3*10] instead of A[3][10]?

I code in C++.

Alireza
  • 410
  • 3
  • 17

1 Answers1

6

A/ Naming convention/clarification

Before giving an answer and for a better clarity It is important to have this fact in mind:

  • in USA, M is used for row size and N for column size

whereas

  • in some other places, like Europe, this is the reverse N is for row size and M is for column size

Comments:

  • All the Blas/Lapack doc you will find in netlib.org use the USA convention

  • I (as a European) must admit that the USA convention is more logical like indices (i,j) and (m,n) follow the same alphabetical order

To avoid such ambiguity I generally use:

  • I_size for row size and J_size for column size

B/ Answers

B.1/ gemm

void cblas_zgemm(CBLAS_LAYOUT layout,
                 CBLAS_TRANSPOSE opA,
                 CBLAS_TRANSPOSE opB,
                 const int M, <-------------- I_Size of op(A) 
                 const int N, <-------------- J_Size of op(B)
                 const int K, <-------------- J_Size of op(A)
                 const void* alpha,
                 const void* A,
                 const int lda,
                 const void* B,
                 const int ldb,
                 const void* beta,
                 void* C,
                 const int ldc);

In verbs if TRANSA = 'T' you must take the dimensions of the transposed A matrix.

The implementation to call cblas_zgemm may look like:

const Size_t opA_I_size = (opA == CblasNoTrans) ? A.I_size() : A.J_size();
const Size_t opA_J_size = (opA == CblasNoTrans) ? A.J_size() : A.I_size();

const Size_t opB_I_size = (opB == CblasNoTrans) ? B.I_size() : B.J_size();
const Size_t opB_J_size = (opB == CblasNoTrans) ? B.J_size() : B.I_size();

cblas_zgemm(CblasColMajor,
            opA,
            opB,
            opA_I_size,
            opB_J_size,
            opA_J_size,
            alpha,
            A.data(),
            A.ld(),
            B.data(),
            B.ld(),
            beta,
            C.data(),
            C.ld());

B.2/ Memory layout

For Blas/Lapack compatibility and more generally for number crunching...

never use A[I_size][J_size] but always A[I_size*J_size]

(the reason is: in one case you have an array of pointers, in the other case you have a contiguous memory block which is much more convenient for vectorization, cache friendness etc.)

To be more precise for

  • column major (Fortran style) you have: A[ld*J_size]

  • row major (C style) you have: A[I_size*ld]

(where ld is the leading dimension)

Updates:

  • Even if you are coding in C++ I recommend to use the Fortran convention (column major). Lapacke pretends to support row major mode too however, under the hood, it simply copies your matrix into a column major layout before calling the requested subroutine. So this extra facility is only an illusion (concerning perfs). To be more precise this is the LAPACKE_dge_trans() function. You can check Lapacke code to see that this function is used nearly everywhere as soon as Layout=RowMajor (see the lapacke_dgesv_work() code for instance).

  • Also note that if you want generic strides ("arbitrary leading dimension" in both I and J directions) you can use library like Blis instead of Blas. The real advantage is to be able to create arbitrary 2D-views of Tensors. This choice depends on your application, I do not know if you have tensor manipulation in mind.

B.3/ Matrix dimensions

If your matrices will always be as small as 3x10 blas/lapack is not a good choice (for perfomance). Considere using a library like Eigen or Blaz.

Picaud Vincent
  • 10,518
  • 5
  • 31
  • 70
  • Awesome explanation. Thank you! My matrices are quite large I was just using 3 and 10 as examples. – Alireza Sep 22 '17 at 09:20
  • 1
    @Alireza I just added some "updates" concerning memory layout / Lapacke (see end of B.2) – Picaud Vincent Sep 22 '17 at 13:53
  • 2
    This is an excellent explanation. It's worth noting that you can simplify your thinking quite a bit by observing that M and N are *always* the row and column counts of C, and that K is always the dimension being multiplied over. I find this much, much simpler to keep track of than trying to reason about M and N in terms of the dimensions of A and B, because it's not effected by transpose. – Stephen Canon Sep 22 '17 at 14:53
  • 2
    @StephenCanon I do agree and you are absolutely right. I will clean my post tomorrow as soon as I have some time (I'm a bit in a hurry right now). Thank you for your positive & constructive feedback. – Picaud Vincent Sep 22 '17 at 14:59