Strange values when converting bytes to integer in c

Question

Take a look at this code:

#include <stdio.h>
#include <stdlib.h>

int byteToInt(char *bytes) {
    int32_t v = 
        (bytes[0]      ) +
        (bytes[1] << 8 ) +
        (bytes[2] << 16) +
        (bytes[3] << 24);
    return v;
}

int main() {
    char b1[] = {0xec, 0x51, 0x04, 0x00};
    char b2[] = {0x0c, 0x0c, 0x00, 0x00};

    printf("%d\n", byteToInt(b1));
    printf("%d\n", byteToInt(b2));
    printf("%d\n", *(uint32_t *)b1);
    printf("%d\n", *(uint32_t *)b2);

    return 0;
}

{0xec, 0x51, 0x04, 0x00} is equal to 283116, but when I use byteToInt function, it returns, for some reason, 282860. There are some byte arrays that cause similar troubles. I realized that value is always mistaken by 256. Still, most of the cases work without any problems - just take a look at b2, it's being calculated as 3084, which is correct. Casting method works in these cases perfetcly but I'd like to know what described problems happen. Could someone, please, explain this to me?

Casting a `char` array to `uint32_t *` is undefined behavior as it violates [strict aliasing](http://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule). It's a good way to get a `SIGBUS` on a machine that has strict alignment requirements. — Andrew Henle, Jun 04 '16 at 01:22

AlexD · Accepted Answer · 2016-06-04T15:08:46.110

4

Perhaps char is a signed type (it is implementation-defined), and (int)(char)(0xec) is -20, while (int)(unsigned char)(0xec) is 236.

Try to use unsigned char and uint32_t.

uint32_t byteToInt(unsigned char *bytes) {
    uint32_t v =
        ((uint32_t)bytes[0]) +
        ((uint32_t)bytes[1] << 8) +
        ((uint32_t)bytes[2] << 16) +
        ((uint32_t)bytes[3] << 24);
    return v;
}

int main() {
    unsigned char b1[] = { 0xec, 0x51, 0x04, 0x00 };
    unsigned char b2[] = { 0x0c, 0x0c, 0x00, 0x00 };

    printf("%u\n", byteToInt(b1));     // 'u' for unsigned
    printf("%u\n", byteToInt(b2));
    //printf("%u\n", *(uint32_t *)b1); // undefined behavior
    //printf("%u\n", *(uint32_t *)b2); // ditto

    return 0;
}

Note that re-interpretation memory content as it is done in two last printfs is undefined behavior (although often works in practice).

BTW, shifting signed negative values is undefined according to the standard:

The result of E1 << E2 is E1 left-shifted E2 bit positions; ... If E1 has a signed type and nonnegative value, and E1 × 2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

edited Jun 04 '16 at 15:08

answered Jun 03 '16 at 22:52

AlexD

32,156
3
71
65

That's right, it was signed char in my case, your code works flawlessly. Thank you! – solusipse Jun 04 '16 at 00:21
@solusipse Please keep in mind that shifting signed values could be fragile. (I just updated the answer.) – AlexD Jun 04 '16 at 01:01
1

Under the reasonable assumption that char is 8 bits and int is 32 bits, doesn't `bytes[3] << 24` cause UB when the top bit of `bytes[3]` is set? `bytes[3]` will be promoted to `int` before the shift and `bytes[3] << 24` will result in an overflow. `((uint32_t)bytes[3]) << 24` is correct and avoids UB. – Paul Hankin Jun 04 '16 at 04:20
1

@solusipse I added a small correction to prevent promotion to `int`, thanks to **@PaulHankin**. – AlexD Jun 04 '16 at 15:23

score 0 · Answer 2 · answered Jun 03 '16 at 23:03

There are several potential issues with this code. The first is that it is compiler dependent on whether the char type is 8 bits, 16 bits, or even 32 bits. When you do a shift operation on the character type, it may potentially lose the bits "off the end" of the value.

It is safer to first cast the values to a 32 bit type before shifting them and adding them. For example:

unsigned long v = 
    ((unsigned long)bytes[0]      ) +
    ((unsigned long)bytes[1] << 8 ) +
    ((unsigned long)bytes[2] << 16) +
    ((unsigned long)bytes[3] << 24);

Your use of the int32_t is also compiler dependent. If memory serves, that's a Windows specific reclassification of int. "int" itself is compiler dependent, older compilers may have it as a 16 bit value, as the standard says it should be the size of a word on the machine you are working on. Using "long" instead of "int" guarantees a 32 bit value.

Additionally, I used "unsigned long" in the example, because I don't think you want to deal with negative numbers in this case. In binary representation, negative numbers have the highest bit set (0x8000000).

If you do want to use negative numbers, then the type should be "long" instead, although this opens a different can of worms when adding positive valued bytes to a negative valued largest byte. In the case where you wanted to deal with negative numbers, you should do a wholly different conversion that peels off the high bit of the high byte, does the addition, and then, if the high bit was set, makes the value negative (v = -v;), and then you need to subtract 1 because of the representation of negative numbers (which is probably outside the scope of this question.)

The revised code, then would be:

#include <stdio.h>
#include <stdlib.h>

unsigned long byteToInt(char *bytes) {
    unsigned long v = 
        ((unsigned long)bytes[0]      ) +
        ((unsigned long)bytes[1] << 8 ) +
        ((unsigned long)bytes[2] << 16) +
        ((unsigned long)bytes[3] << 24);
    return v;
}

int main() {
    char b1[] = {0xec, 0x51, 0x04, 0x00};
    char b2[] = {0x0c, 0x0c, 0x00, 0x00};

    printf("%d\n", byteToInt(b1));
    printf("%d\n", byteToInt(b2));
    printf("%d\n", *(unsigned long *)b1);
    printf("%d\n", *(unsigned long *)b2);

    return 0;
}

Strange values when converting bytes to integer in c

2 Answers2