-1

The following program outputs 0x20001 which is 131073 in decimal.

int main()
{
    char foo[8] = {1,0,2,0,3,0,4,0};
    long int *derp = &foo;
    printf("%d\n", sizeof(long int)); // THIS OUTPUTS 8
    printf("0x%x\n",derp[0]);
    return 0;
}

I know

  • The long pointer dereference (i.e. derp[0]) will interpret all 8 bytes.
  • Endianness is at play here. (I am on an Intel PC, so I am expecting little endian)

However, I cannot figure out how the above answer is reached.

Fuad
  • 1,419
  • 1
  • 16
  • 31
  • 1
    I think this is technically undefined behavior. – Mateen Ulhaq Oct 23 '19 at 03:15
  • @MateenUlhaq why would it be undefined? &foo is simply an address value at the end of the day. And there are 8 bytes located there, which is what a long int comprises of. – Fuad Oct 23 '19 at 03:16
  • @MateenUlhaq chars are always 1 byte = 8 bits. – Fuad Oct 23 '19 at 03:17
  • g++ is complaining that `error: cannot convert ‘char (*)[8]’ to ‘long int*’ in initialization`, so this is actually probably invalid code, not merely undefined. – Mateen Ulhaq Oct 23 '19 at 03:20
  • @MateenUlhaq use gcc without arguments. It will just give a warning, not error. I do not see how it is invalid, because addresses are addresses and bytes are bytes. – Fuad Oct 23 '19 at 03:22
  • @MateenUlhaq same output when I cast the pointer (also warning goes away) – Fuad Oct 23 '19 at 03:23
  • Using gcc 9.2.0, I get the same error. – Mateen Ulhaq Oct 23 '19 at 03:24
  • It is not true that addresses are addresses and bytes are bytes. Due to the strict aliasing rule, it is undefined behavior to use a `long int *` to point to an object that was declared as `char[8]`. https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule – Jonathan Callen Oct 23 '19 at 03:34
  • @Fuad in C `char` is always 1 byte is true, but 1 byte can have more than 8 bits. See [What platforms have something other than 8-bit char?](https://stackoverflow.com/q/2098149/995714). And you have UB on both of your `printf` lines – phuclv Oct 23 '19 at 03:50

1 Answers1

3

You're not reading all the bytes of the number because you're using the wrong format specifier.

The %x format specifier expects an unsigned int, but you're passing a long int. Formally, using the wrong format specifier invokes undefined behavior. Also, casting the address of one type to the address of another type besides char and subsequently dereferencing that pointer is also undefined behavior. That being said, here's what is probably happening.

Assuming an int is 4 bytes and a long int is 8 bytes, that means that only the first 4 bytes of the value are read. Assuming again that your machine is little-endian, meaning that the least significant byte for an integer type comes first, the value you see is those first 4 bytes in the reverse order.

But again, this is undefined behavior. You would most likely see something very different if you passed in a floating point value when the format specifier expects an integer or vice versa. You could also crash your program if your char array isn't properly aligned for a long.

To fix this, change the format specifier to %lx which expects an unsigned long int however passing a long int is also permissible.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • "the value you see is those first 4 bytes in the reverse order". But I cannot figure it out. Reverse order of the first 4 bytes would be 0201, which is not 131073. – Fuad Oct 23 '19 at 03:26
  • 1
    @Fuad No, its 00 02 00 01 hex = 131073 dec which is the first 4 bytes in reverse order. – dbush Oct 23 '19 at 03:30
  • `printf("%d\n", sizeof(long int))` is a more serious issue. `sizeof` returns `size_t` which [must be printed using `%zu`](https://stackoverflow.com/q/27296011/995714) – phuclv Oct 23 '19 at 03:49