A/ Naming convention/clarification
Before giving an answer and for a better clarity It is important to have this fact in mind:
- in USA, M is used for row size and N for column size
whereas
- in some other places, like Europe, this is the reverse N is for row size and M is for column size
Comments:
All the Blas/Lapack doc you will find in netlib.org use the USA convention
I (as a European) must admit that the USA convention is more logical like indices (i,j) and (m,n) follow the same alphabetical order
To avoid such ambiguity I generally use:
- I_size for row size and J_size for column size
B/ Answers
B.1/ gemm
void cblas_zgemm(CBLAS_LAYOUT layout,
CBLAS_TRANSPOSE opA,
CBLAS_TRANSPOSE opB,
const int M, <-------------- I_Size of op(A)
const int N, <-------------- J_Size of op(B)
const int K, <-------------- J_Size of op(A)
const void* alpha,
const void* A,
const int lda,
const void* B,
const int ldb,
const void* beta,
void* C,
const int ldc);
In verbs if TRANSA = 'T' you must take the dimensions of the transposed A matrix.
The implementation to call cblas_zgemm
may look like:
const Size_t opA_I_size = (opA == CblasNoTrans) ? A.I_size() : A.J_size();
const Size_t opA_J_size = (opA == CblasNoTrans) ? A.J_size() : A.I_size();
const Size_t opB_I_size = (opB == CblasNoTrans) ? B.I_size() : B.J_size();
const Size_t opB_J_size = (opB == CblasNoTrans) ? B.J_size() : B.I_size();
cblas_zgemm(CblasColMajor,
opA,
opB,
opA_I_size,
opB_J_size,
opA_J_size,
alpha,
A.data(),
A.ld(),
B.data(),
B.ld(),
beta,
C.data(),
C.ld());
B.2/ Memory layout
For Blas/Lapack compatibility and more generally for number crunching...
never use A[I_size][J_size] but always A[I_size*J_size]
(the reason is: in one case you have an array of pointers, in the other case you have a contiguous memory block which is much more convenient for vectorization, cache friendness etc.)
To be more precise for
(where ld is the leading dimension)
Updates:
Even if you are coding in C++ I recommend to use the Fortran convention (column major). Lapacke pretends to support row major mode too however, under the hood, it simply copies your matrix into a column major layout before calling the requested subroutine. So this extra facility is only an illusion (concerning perfs). To be more precise this is the LAPACKE_dge_trans() function. You can check Lapacke code to see that this function is used nearly everywhere as soon as Layout=RowMajor
(see the lapacke_dgesv_work() code for instance).
Also note that if you want generic strides ("arbitrary leading dimension" in both I and J directions) you can use library like Blis instead of Blas. The real advantage is to be able to create arbitrary 2D-views of Tensors. This choice depends on your application, I do not know if you have tensor manipulation in mind.
B.3/ Matrix dimensions
If your matrices will always be as small as 3x10 blas/lapack is not a good choice (for perfomance). Considere using a library like Eigen or Blaz.