Warning: Actually it is not due to power of 2 but parity. See the EDIT section.
I have found a code which shows quite a strange behaviour.
The code uses a 2D array (size x size). When size is a power of 2, the code goes between 10% and 40% slower, the slowest being for size=32.
I have obtained these results with Intel compiler. If I compile with gcc 5.4, the power of 2 problem disappears but the code is 3x slower. Having tested it on different CPUs, I think it should be reproducible enough.
Code:
#define N 10000000
unsigned int tab1[TS][TS];
void simul1() {
for(int i=0; i<TS; i++)
for(int j=0; j<TS; j++) {
if(i > 0)
tab1[i][j] += tab1[i-1][j];
if(j > 0)
tab1[i][j] += tab1[i][j-1];
}
}
int main() {
for(int i=0; i<TS; i++)
for(int j=0; j<TS; j++)
tab1[j][i] = 0;
for(int i=0; i<N; i++) {
tab1[0][0] = 1;
simul1();
}
return tab1[10][10];
}
Compilation:
icc:
icc main.c -O3 -std=c99 -Wall -DTS=31 -o i31
icc main.c -O3 -std=c99 -Wall -DTS=32 -o i32
icc main.c -O3 -std=c99 -Wall -DTS=33 -o i33
gcc:
gcc main.c -O3 -std=c99 -Wall -DTS=31 -o g31
gcc main.c -O3 -std=c99 -Wall -DTS=32 -o g32
gcc main.c -O3 -std=c99 -Wall -DTS=33 -o g33
Results on a Xeon E5:
time ./icc31
4.549s
time ./icc32
6.557s
time ./icc33
5.188s
time ./gcc31
13.335s
time ./gcc32
13.669s
time ./gcc33
14.399s
My questions:
- Why is there a performance loss when the size of array is exactly 32 ?
- (Optional: Why is icc 3x faster than gcc here ?)
EDIT: Actually this is due to parity, and only applies from size >= 32. The performance difference between even and odd numbers is consistent, and decreases when size becomes bigger. Here is a more detailed benchmark :
(The y axis is number of elements per second in millions, obtained with TS² * N / size / 1000000)
My CPU has a 32KB L1 cache and 256 KB L2