1

Half way through refactoring some code (hence, ugliness), I ended up with something equivalent to the following:

#include <stdlib.h>
#include <stdio.h>

int main(int argc, char const *argv[]) {
    int *h = NULL;
    int size = 1024;
    int p = 4;
    h = (int*)malloc(sizeof(int) * size * p);

    printf("%p\n\n", h );

    for (int i = 0; i < p; ++i) {
        int *h = &h[i * size];
        printf("%p %d\n", h, i );
    }
    return 0;
}

Output:

0x11f3010

0x11f7020 0
0x11f7020 1
0x11f7020 2
0x11f7020 3

There are two things which I didn't expect, one of which I can't even rationalize.

The first is that the RHS of the loop-local init of h surprisingly references the just created h instead of the h in the outer scope. That's a little surprising, Since I expect the RHS to be evaluated prior to the creation of the variable, but I guess init follows creation, ok then. Also, this makes sense considering sometimes we need to initialize a circular data structure like a linked list, in which the init value might deliberately contain a self reference.

I can't explain away the second issue except as a bug. Despite initializing the loop-local pointer with different offsets into h, h always points at the same address, namely itself. Contrast:

#include <stdlib.h>
#include <stdio.h>

int main(int argc, char const *argv[]) {
    int *h2 = NULL;
    int size = 1024;
    int p = 4;
    h2 = (int*)malloc(sizeof(int) * size * p);

    printf("%p\n\n", h2 );

    for (int i = 0; i < p; ++i) {
        int *h = &h2[i * size];
        printf("%p %d\n", h, i );
    }
    return 0;
}

Output:

0x65e010

0x65e010 0
0x65f010 1
0x660010 2
0x661010 3

... which gives the expected stride in the addresses h points to. What gives?

Update: Curiouser and Curiouser

# clang --version    
# lang version 3.8.0 (tags/RELEASE_380/final)

# clang -O3 untitled.cpp -o 1 && ./1
0x11a0010

0x11a4020 0
0x11a4020 1
0x11a4020 2
0x11a4020 3

# gcc -O3 untitled.cpp -o 1  && ./1
0x2494010

0x4005a0 0
0x4015a0 1
0x4035a0 2
0x4065a0 3

# gcc --version
gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1)

So it seems clang is surprising one way, and gcc in another (note the strides in gcc are not constant at size*sizeof(int)=0x1000, but are 0x1000,0x2000,0x3000.

Are there any language lawyers here who can vouch for what the correct thing would be here? Is this simply undefined behavior?

Cœur
  • 37,241
  • 25
  • 195
  • 267
  • 1
    `int *h = &h[i * size];` is undefined behaviour so getting a surprising result is completely unsurprising! `clang` perhaps detects the usage of an uninitialised variable and just leaves `h` uninitialised, but `gcc` maybe leaves `h` pointing at something random on the stack. – Ken Y-N Jan 23 '17 at 02:51
  • Ah, so the shadowing bit is just a red herring. Yeah, I guess it's a dupe. – Lauren Berrigan Jan 23 '17 at 02:59
  • 1
    Just an FYI, you really should *not* [cast the return value of `malloc()`](http://stackoverflow.com/a/605858/1701799). Also, I'm not seeing any difference in your two code snippets? Am I just blind? Another FYI, when printing pointers, you are supposed cast to `void *`, as that is what the `%p` specifier expects, and the cast is not implicit because it is variadic. E.g., `printf("%p\n", (void *)h);`. – RastaJedi Jan 23 '17 at 05:21

0 Answers0