Half way through refactoring some code (hence, ugliness), I ended up with something equivalent to the following:
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char const *argv[]) {
int *h = NULL;
int size = 1024;
int p = 4;
h = (int*)malloc(sizeof(int) * size * p);
printf("%p\n\n", h );
for (int i = 0; i < p; ++i) {
int *h = &h[i * size];
printf("%p %d\n", h, i );
}
return 0;
}
Output:
0x11f3010
0x11f7020 0
0x11f7020 1
0x11f7020 2
0x11f7020 3
There are two things which I didn't expect, one of which I can't even rationalize.
The first is that the RHS of the loop-local init of h surprisingly references the just created h
instead of the h
in the outer scope. That's a little surprising, Since I expect the RHS to be evaluated prior to the creation of the variable, but I guess init follows creation, ok then. Also, this makes sense considering sometimes we need to initialize a circular data structure like a linked list, in which the init value might deliberately contain a self reference.
I can't explain away the second issue except as a bug. Despite initializing the loop-local pointer with different offsets into h
, h
always points at the same address, namely itself. Contrast:
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char const *argv[]) {
int *h2 = NULL;
int size = 1024;
int p = 4;
h2 = (int*)malloc(sizeof(int) * size * p);
printf("%p\n\n", h2 );
for (int i = 0; i < p; ++i) {
int *h = &h2[i * size];
printf("%p %d\n", h, i );
}
return 0;
}
Output:
0x65e010
0x65e010 0
0x65f010 1
0x660010 2
0x661010 3
... which gives the expected stride in the addresses h
points to. What gives?
Update: Curiouser and Curiouser
# clang --version
# lang version 3.8.0 (tags/RELEASE_380/final)
# clang -O3 untitled.cpp -o 1 && ./1
0x11a0010
0x11a4020 0
0x11a4020 1
0x11a4020 2
0x11a4020 3
# gcc -O3 untitled.cpp -o 1 && ./1
0x2494010
0x4005a0 0
0x4015a0 1
0x4035a0 2
0x4065a0 3
# gcc --version
gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1)
So it seems clang is surprising one way, and gcc in another (note the strides in gcc are not constant at size*sizeof(int)=0x1000
, but are 0x1000,0x2000,0x3000.
Are there any language lawyers here who can vouch for what the correct thing would be here? Is this simply undefined behavior?