4

I have some code that I would like to run fast, so I was hoping I could persuade gcc (g++) to vectorise some of my inner loops. My compiler flags include

-O3 -msse2 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=5

but gcc fails to vectorize the most important loops, giving me the following not-really-very-verbose-at-all messages:

Not vectorized: complicated access pattern.

and

Not vectorized: unsupported use in stmt.

My questions are (1) what exactly do these mean? (How complicated does it have to be before it's too complicated? Unsupported use of what exactly?), and (2) is there any way I can get the compiler to give me even just a tiny bit more information about what I'm doing wrong?

An example of a loop that gives the "complicated access pattern" is

for (int s=0;s<N;++s)
    a.grid[s][0][h-1] =  D[s] * (b.grid[s][0][h-2] + b.grid[s][1][h-1] - 2*b.grid[s][0][h-1]);

and one that gives "unsupported use in stmt" is the inner loop of

for (int s=0;s<N;++s)
    for (int i=1;i<w-1;++i) 
        for (int j=1;j<h-1;++j) 
            a.grid[s][i][j] = D[s] * (b.grid[s][i][j-1] + b.grid[s][i][j+1] + b.grid[s][i-1][j] + b.grid[s][i+1][j] - 4*b.grid[s][i][j]);

(This is the one that really needs to be optimised.) Here, a.grid and b.grid are three-dimensional arrays of floats, D is a 1D array of floats, and N, w and h are const ints.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
N. Virgo
  • 7,970
  • 11
  • 44
  • 65
  • This question http://stackoverflow.com/questions/8144191/why-does-gcc-not-auto-vectorize-this-loop is related, but the answer there is very specific to that person's particular problem, whereas I'm hoping for some more general information about what these messages mean, so I hope it's OK to ask another question. – N. Virgo Nov 22 '12 at 03:35
  • I can see why the first case doesn't vectorize. In the second case, does the compiler give out anymore info besides `"unsupported use in stmt"`? – Mysticial Nov 22 '12 at 04:49
  • @Mysticial if you can see why the first doesn't vectorise, please enlighten me! (I don't need to to, particularly, but it would be nice to know what's happening.) Regarding the second, no, the compiler does not give any more information than "unsupported use in stmt" and the line number. – N. Virgo Nov 22 '12 at 06:28
  • 1
    The first case involves non-sequential access since `s` isn't the index for the lowest dimension. That alone will usually block vectorization. I have no idea about the second case. I can certainly vectorize the second case. – Mysticial Nov 22 '12 at 06:55

1 Answers1

3

Not vectorized: complicated access pattern.

The "uncomplicated" access patterns are consecutive elements access or strided element access with certain restrictions (single element of the group accessed in the loop, group element count being a power of 2, group size being multiple of the vector type).

b.grid[s][0][h-2] + b.grid[s][1][h-1] - 2*b.grid[s][0][h-1]);

Neither sequential nor strided access

Not vectorized: unsupported use in stmt.

Here "use" is in the data-flow sense, getting the value of a variable (register, compiler temporary). In this case the "supported uses" are variables, defined in the current iteration of the loop, constants and loop invariants.

a.grid[s][i][j] = D[s] * (b.grid[s][i][j-1] + b.grid[s][i][j+1] + b.grid[s][i-1][j] + b.grid[s][i+1][j] - 4*b.grid[s][i][j]);

In this example, I think the "unsupported use" is because b.grid[s][i][j-1] and b.grid[s][i][j+1] are assigned ("defined") by a previous iteration of the loop.

chill
  • 16,470
  • 2
  • 40
  • 44
  • You overlooked the same thing that I initially did. Notice that the second case is done completely out-of-place. (Read from `b` and write to `a`.) So all the iterations are indeed independent. – Mysticial Nov 22 '12 at 16:28
  • Yes, indeed. In fact, that loop is vectorized by GCC. Perhaps in the OP case, the compiler does not know that `a.grid` and `b.grid` do not alias? PS. e.g. if they are declared as `struct S { float (*grid)[P][Q]; ... };` – chill Nov 22 '12 at 16:41
  • That's possible, although I believe it will say something about possible aliasing if that really is the case. +1 for pointing out that GCC actually does do it. – Mysticial Nov 22 '12 at 16:42
  • They are declared as `struct S { float grid[N][w][h]; ... };`. In the context of my code, they are not vectorised. – N. Virgo Nov 23 '12 at 02:11
  • a and b are both declared in the global scope of the same cpp file, if that makes a difference. – N. Virgo Nov 23 '12 at 02:12