I have some code that I would like to run fast, so I was hoping I could persuade gcc (g++) to vectorise some of my inner loops. My compiler flags include
-O3 -msse2 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=5
but gcc fails to vectorize the most important loops, giving me the following not-really-very-verbose-at-all messages:
Not vectorized: complicated access pattern.
and
Not vectorized: unsupported use in stmt.
My questions are (1) what exactly do these mean? (How complicated does it have to be before it's too complicated? Unsupported use of what exactly?), and (2) is there any way I can get the compiler to give me even just a tiny bit more information about what I'm doing wrong?
An example of a loop that gives the "complicated access pattern" is
for (int s=0;s<N;++s)
a.grid[s][0][h-1] = D[s] * (b.grid[s][0][h-2] + b.grid[s][1][h-1] - 2*b.grid[s][0][h-1]);
and one that gives "unsupported use in stmt" is the inner loop of
for (int s=0;s<N;++s)
for (int i=1;i<w-1;++i)
for (int j=1;j<h-1;++j)
a.grid[s][i][j] = D[s] * (b.grid[s][i][j-1] + b.grid[s][i][j+1] + b.grid[s][i-1][j] + b.grid[s][i+1][j] - 4*b.grid[s][i][j]);
(This is the one that really needs to be optimised.) Here, a.grid and b.grid are three-dimensional arrays of floats, D is a 1D array of floats, and N, w and h are const ints.