Gcc autovectorization weird behaviour in matrix multiply when arrays are function parameters

Question

I'm benchmarking different matrix multiply forms with different optimization levels (for teaching purposes) and I detected a strange behavior in gcc autovectorization. It fails to vectorize when arrays are parameters (see mxmp) but is able to vectorize when arrays are global variables (see mxmg)

gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1) but behaviour was the same with older gcc versions

Compiling options: gcc -O3 -mavx2 -mfma

#define N 1024
float A[N][N], B[N][N], C[N][N];

void mxmp(float A[N][N], float B[N][N], float C[N][N]) {
  int i,j,k;
  for (i=0; i<N; i++)
    for (j=0; j<N; j++)
      for (k=0; k<N; k++)
        C[i][j] = C[i][j] + A[i][k] * B[k][j];
}

void mxmg() {
  int i,j,k;
  for (i=0; i<N; i++)
    for (j=0; j<N; j++)
      for (k=0; k<N; k++)
        C[i][j] = C[i][j] + A[i][k] * B[k][j];
}

main(){
  mxmg();
  mxmp(A, B, C);
}

I expected the compiler to do the same in both functions however mxmp requires about 10 times the execution time of mxmg. Exploring the assembly code it just happens that gcc is able to autovectorize mxmg (when arrays are global variables) but fails to vectorize mxmp (where arrays are parameters).

Tried the same with kij form and it's able to vectorize both functions.

I need help to discover why gcc has this behavior. And how to help gcc (pragmas, compile options, atributes, ...) to properly vectorize mxmp function. Thanks

note that it's also not an efficient way to multiply matrices because it's not cache friendly. See [Optimized matrix multiplication in C](https://stackoverflow.com/q/1907557/995714) — phuclv, Jun 20 '19 at 14:30

score 0 · Accepted Answer · answered Jun 20 '19 at 11:45

When the arrays are global, the compiler can easily see that they are disjoint memory regions. When they are function parameters, you could call mxmp(A,A,A), so it has to assume that writing to C may modify A or B, which could affect later iterations and complicates vectorization. Of course the compiler could inline or do other things to know it in your particular case...

You can explicitly specify the lack of aliasing with restrict:

void mxmp(float A[restrict N][N], float B[restrict N][N], float C[restrict N][N]) {

Thanks.I had the assumption that restrict was only for pointers and that for arrays, where the dimensions where known at compile time, gcc asumed restrict implicitly. I see i was wrong. — Roger de Lluria, Jun 20 '19 at 12:22

Gcc autovectorization weird behaviour in matrix multiply when arrays are function parameters

1 Answers1