What happens when you cast a char * address to int * in C when the address is not word-aligned?

Question

I'm running this bit of code to understand pointers a little better.

void foo(void)
{
    int a[4] = {0, 1, 2, 3};

    printf("a[0]:%d, a[1]:%d, a[2]:%d, a[3]:%d\n", a[0], a[1], a[2], a[3]);

    int *c;

    c = a + 1;
    c = (int *)((char*) c + 1);
    *c = 10;

    printf("c:%p, c+1:%p\n", c, c+1);
    printf("a:%p, a1:%p, a2:%p, a3:%p\n", a, a+1, a+2, a+3);

    printf("a[0]:%d, a[1]:%d, a[2]:%d, a[3]:%d\n", a[0], a[1], a[2], a[3]);

    printf("c[0]:%d, c[1]:%d\n", *c, *(c+1));

}

The output I get is:

a[0]:0, a[1]:1, a[2]:2, a[3]:3
c:0xbfca1515, c+1:0xbfca1519
a:0xbfca1510, a1:0xbfca1514, a2:0xbfca1518, a3:0xbfca151c
a[0]:0, a[1]:2561, a[2]:0, a[3]:3
c[0]:10, c[1]:50331648

Could someone please explain how a[1] is now 2561?

I understand that when we do this:

c = (int *) ((char *) c + 1);

c is now pointing to the 4 bytes following the first byte of a[1].

But how did a[1] end up with 2561?

I'm guessing this has to do with endianness?

Print the contents of the array as *bytes* to see it. That should explain what you're seeing. — Some programmer dude, Feb 13 '18 at 20:40
Note that the outcome of this code will be different depending on the endianness of the processor. And on some processors, the code will just crash because of alignment restrictions. In short, don't do that. — user3386109, Feb 13 '18 at 20:43
You violate strict aliasing (which is true regardless of alignment) — Christian Gibbons, Feb 13 '18 at 20:53

Jean-François Fabre · Accepted Answer · 2018-02-15T20:05:36.390

6

c = a + 1;

now c points on 1 (second element of a)

c = (int *)((char*) c + 1);

You "cheated" with pointer arithmetic, adding 1 to the address, regardless of the size of the int (note that it is illegal on old machines like 68000 which don't tolerate multi-byte access to odd addresses, or will do the job, albeit a lot slower, which is kind of worse since you're not noticing it for instance it works on a 68020 but slower).

now c points on the 3 last bytes of a[1] and overflows on the first byte of a[2], so when you do:

*c = 10;

since your machine is little endian, you're leaving the leading 1 value, write 10 in the next location, and zeroes afterwards, clobbering the leading 2 byte of a[2]

So now:

 a[1] = 1 + (10<<8) = 2561
 a[2] = 0

the result is different on a big endian machine:

PowerPC big endian (if int is 32 bit, else it's a different result):

a[1] = 10485760
a[2] = 2   // first byte is overwritten, but with zero

68000/68010:

bus error (coredump) / guru meditation

to sum it up: Don't violate the strict aliasing rule

edited Feb 15 '18 at 20:05

answered Feb 13 '18 at 20:45

Jean-François Fabre

137,073
23
153
219

3

And many RISC machines don't like misaligned access. DEC Alpha converted a misaligned access into a system trap that either aborted the program or processed the read in several parts and mangled the data to assemble the answer. That was not fast — a misaligned memory access was something to avoid at all costs. – Jonathan Leffler Feb 13 '18 at 20:48
yes, some machines tolerate unalignment, but it's more costly, specially if it's emulated by software! (sorry for the 680x0 love BTW :)) – Jean-François Fabre Feb 13 '18 at 20:49
I still have a soft spot for the 680x0 chip set. PPC is an heir and successor via the RS6000 chips, I believe. The DEC Alpha natively objected to the misaligned access; it was a software 'fix' that allowed it to continue. Personally, I think that the crash was appropriate. – Jonathan Leffler Feb 13 '18 at 20:54
1

They "fixed" the misaligned access on 68020 (68000/68010 crash on odd access), but that means that codes for 68020 using only 68000 instruction may only run on ... 68020... how sick is that :) – Jean-François Fabre Feb 13 '18 at 20:55

What happens when you cast a char * address to int * in C when the address is not word-aligned?

1 Answers1

Linked