2

I recently wanted to assign the content of a 4-bytes array into a 4 byte variable with byte-shifting and logical OR.

I wanted to check if casting directly the array into an int* and getting the value works. I wrote a little test program :

int main(int argc, char** argv) {
    int out = 0;
    char in[4] = {0xFF,0xFF,0xFF,0xFF};
    char *in2 = NULL;


    printf("out = %d\n",out);

    out = (int)in[0] << 24 | (int)in[1] << 16 | (int)in[2] << 8 | (int)in[3];
    printf("out = %d\n",out);
    out = 0;

    out = *((int*)in);
    printf("out = %d\n",out);

    in2 = (char*)malloc(4);
    for(int i = 0; i < 4; i++)
        in2[i] = 0xFF;


    out = 0;
    // Byte-Shifting & OR
    out = (int)in2[0] << 24 | (int)in2[1] << 16 | (int)in2[2] << 8 | (int)in2[3]; 
    printf("out = %d\n",out);
    out = 0;

    //Direct assignment by recasting
    out = *((int*)in2); 
    printf("out = %d\n",out);

    return 0;
}

The output was as expected equals to -1 :

out = 0 out = -1 out = -1 out = -1 out = -1

But I'm worried about memory alignment and the safety of this method : is it safe ? If not, is it due to compiler specification or some kind of undefined behavior ?

  • What's wrong with `memcpy`? – Kerrek SB Nov 20 '15 at 09:55
  • The problem is with the endianness of system, also. – LPs Nov 20 '15 at 09:55
  • 2
    If your `int` is 32-bit long and your `char` is unsigned, `(int)in[0] << 24` will cause an overflow and it is undefined behavior. – MikeCAT Nov 20 '15 at 09:57
  • @MikeCAT Not true. 24 left shift is inside 32 bit int. – LPs Nov 20 '15 at 10:02
  • @LPs: Partly true. It will become UB if the `char` is `>127`, independent if it is signed or unsigned. – too honest for this site Nov 20 '15 at 10:14
  • 2
    Use the shift/or, but do not use `char` array! `char can be signed or unsigned. And do not use signed integers in general here. In general, use `stdint.h` types `uint8_t`/`uint32_t`. If you finally need a signed integer, cast the result to `int32_t`. That not only has the correct width (which is not guaranteed for `char` or `int`), but also is guaranteed to use 2s complement. – too honest for this site Nov 20 '15 at 10:17
  • @Olaf I want to understand. If 0xff int casted is 24 times right shifted becomes 0xFF000000. This assigned to an int var result in -16777216. What ma I missing? – LPs Nov 20 '15 at 10:27
  • @LP: http://port70.net/~nsz/c/c11/n1570.html#6.5.7p4 So the `char` is first coerced to `int`, which is left shifted. But the value cannot be represented in an `int` (assuming the `char` was unsigned). Hmm, It might work for `signed char`, though. Have to think about that. Anyway, you would not want to use a `signed char` array for that anyway, as it causes trouble for the other positions. – too honest for this site Nov 20 '15 at 10:39
  • @LPs, the shift in your comment, above, is 'left' shifed 24 bits, not `right` shifted 24 bits – user3629249 Nov 21 '15 at 18:18
  • the `what ma I missing?` is how values (like 'int' are represented) a couple of details: 1) each bit left shifted is like multiplying the value by 2. 2) when the left most bit of the 4 byte (for a 32bit architecture) 'int' value is 1, then the value is negative – user3629249 Nov 21 '15 at 18:23
  • when calling `malloc()` 1) the returned value is type `void*` which can be assigned to any other pointer, so a cast is not need, just clutters the code and can be a real headache when debugging and maintaining the code. 2) always check (!=NULL) the returned value to assure the operation was successful. 3) the program leaks memory because the pointer returned from the call to `malloc()` is not passed to `free()` before exiting the program. – user3629249 Nov 21 '15 at 18:29
  • the posted code fails to cleanly compile for several reasons including the unused parameters `argc` and 'argv[]`. suggest declaring main as `int main( void )`. When compiling, always enable all the warnings, then fix those warnings. (for gcc, at a minimum use: `-Wall -Wextra -pedantic` ) – user3629249 Nov 21 '15 at 18:33
  • although not all will agree, I might suggest declaring a `union` of the 'int' and the 'char' array, then no copying, etc is needed. – user3629249 Nov 21 '15 at 18:35

1 Answers1

7

But I'm worried about memory alignment and the safety of this method : is it safe ? If not, is it due to compiler specification or some kind of undefined behavior ?

No, it's not safe:

out = *((int*)in);

is undefined behavior, it violates C aliasing rules and may perform unaligned access.

ouah
  • 142,963
  • 15
  • 272
  • 331
  • Thanks, I searched about aliasing rules and found this topic adding some details about it : http://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule – Benjamin Debotté Nov 20 '15 at 10:24
  • 2
    @BenjaminDebotté to summarize, aliasing rules say that you can only access an object through its own type, its signed / unsigned variant type, or trough a character type. – ouah Nov 20 '15 at 10:32
  • On top of alignment issues, you also run into byte ordering issues. Your code assumes big endian ordering, whereas most desktop processors today use little endian ordering. – chqrlie Nov 20 '15 at 11:55
  • 1
    This answer is correct (I was just writing the same one). The best way to solve this issue is to use `memcpy`, which most compilers will optimize to a simple load anyway. Another issue that you have is that you are assuming that an `int` is 4 bytes, which is not guaranteed in any way by the C standard. It would be better to let `out` be of type `uint32_t` and do something like `memcpy(out, in, sizeof(*out))`. And of course you will run into endianness issues as the previous comment points out. – dnaq Nov 20 '15 at 14:15