char array to long results in unexpected value

Question

I tried to convert a byte array to a long

long readAndSkipLong(char*& b)
{
    unsigned long ret = (b[0] << 56) | (b[1] << 48) | (b[2] << 40) | (b[3]<<32) | (b[4] << 24) | (b[5] << 16) | (b[6] << 8) | (b[7]);
    return ret;
}

My shifting seems to be not right. For the intended value

152  --> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 10011000

I get:

-104  --> 11111111 11111111 11111111 11111111 11111111 11111111 11111111 10011000

Any idea where the bug is?

What compiler is this? May be long is 32-bits, no? – OldProgrammer Mar 20 '15 at 17:15 — OldProgrammer, Mar 20 '15 at 17:15
64 bit windows machine, but I think 32 bit visual studio – Anthea Mar 23 '15 at 08:43 — Anthea, Mar 23 '15 at 08:43

Sufian Latif · Answer 1 · 2015-03-23T09:27:23.620

3

It's because of type promotion and sign extension. Every value in your char array is signed, and bit-shifting is an integer operation. When you use the shifting operator, it evaluates to an int, and because your chars are signed, shifting them would produce signed ints.

The last (rightmost) byte has 1 as the sign bit. When promoted to an int, its value becomes -104 by sign extension. As you ORed the rest of the numbers, all the 1 bits remained unaffected.

To avoid this problem, you can cast each chars to unsigned long before shifting and ORing.

Another thing you can do is bitwise ANDing each char with 0xff like ((b[i] & 0xff) << 24). ANDing with 0xff would produce an int, keeping the least significant 8 bits intact and zeroes to the left, no sign extension.

edited Mar 23 '15 at 09:27

answered Mar 20 '15 at 17:19

Sufian Latif

13,086
3
33
70

2

after char being promoted to int, shifting that many bits is basically UB. I'd suggest constructing two int32_t first then connect them together. Or, maybe better, `union` an `uint64_t` with a `uint6_t[8]` then use a loop to fill it in. – user3528438 Mar 20 '15 at 17:39
2

@user3528438 using `long long` will also do that. – Sufian Latif Mar 20 '15 at 17:42
casting each char to long had no effect - could you post code? – Anthea Mar 23 '15 at 09:01
@Anthea sorry I was wrong at that - when you cast `char` to `long`, both types are signed and sign extension happens. You can cast to `unsigned long`, or AND the value with `0xff` first. I've updated my answer. – Sufian Latif Mar 23 '15 at 09:30
I implemented as Sid Heroor intended and casted each char to uint8_t. Afterwards I tried your solution with the ANDing with 0xff and it also works. I am not sure which solution is better... – Anthea Mar 23 '15 at 09:49
casting to `unsigned long` won't work because `long` is a 32-bit type on Windows and can't store 64 bits – phuclv Nov 08 '20 at 16:27

phuclv · Answer 2 · 2020-11-08T16:26:44.633

2 things:

char can be signed or unsigned, hence shouldn't be used storing for data types other than characters.

In C, C++ and most C-like languages, any types narrower than int must be promoted to int in an expression, and your statement will be treated like this
```
 unsigned long ret = ((int)b[0] << 56) | ((int)b[1] << 48)
                   | ((int)b[2] << 40) | ((int)b[3] << 32)
                   | ((int)b[4] << 24) | ((int)b[5] << 16)
                   | ((int)b[6] <<  8) | ((int)b[7]);
```
If char is signed, it'll be promoted to int using sign extension. As a result the top bits will be filled with 1s if the byte value is negative.

In MSVC char is signed by default. You can use /J to make char unsigned, which will solve part of your problem. But then another problem arises:
In Windows long is a 32-bit type, consequently you can't pack 8 bytes into it. Moreover int is also 32-bit on most modern systems, and after promoting b[i] to int shifting more than 31 is undefined behavior which is what your program does.

So to fix all the problems portably you need to:

Cast all b[i] to unsigned char or uint8_t, or masking the high bits by ANDing it with 0xFF like 0605002 suggested. Or simply change the type of b to unsigned char&* instead of char&*
Change the return type to a type at least 64-bit wide like (unsigned) long long, (u)int64_t or (u)int_least64_t

The result may look like this

uint64_t readAndSkipLong(unsigned char*& b)
{
    return ((uint64_t)b[0] << 56) | ((uint64_t)b[1] << 48)
         | ((uint64_t)b[2] << 40) | ((uint64_t)b[3] << 32)
         | ((uint64_t)b[4] << 24) | ((uint64_t)b[5] << 16)
         | ((uint64_t)b[6] <<  8) | ((uint64_t)b[7]);
}

or

uint64_t readAndSkipLong(char*& b)
{
    return ((uint64_t)(uint8_t)b[0] << 56) | ((uint64_t)(uint8_t)b[1] << 48)
         | ((uint64_t)(uint8_t)b[2] << 40) | ((uint64_t)(uint8_t)b[3] << 32)
         | ((uint64_t)(uint8_t)b[4] << 24) | ((uint64_t)(uint8_t)b[5] << 16)
         | ((uint64_t)(uint8_t)b[6] <<  8) | ((uint64_t)(uint8_t)b[7]);
}

However you don't actually need to write a function to reverse endian. There are already ntohll() and htonll() for that purpose

reversedEndian = ntohll(originalValue);

If the input must be a char array then just copy the value to a uint64_t

memcpy(&originalValue, &b, sizeof originalValue);
reversedEndian = ntohll(originalValue);

You can further reduce the whole thing to reversedEndian = ntohll(*(int64_t*)&b); if strict aliasing is allowed because on x86 unaligned access is generally permitted

score -1 · Answer 3 · edited Nov 05 '20 at 04:17

Couple of things to think about

include cstdint and use std::uint64_t and std::uint8_t for you types so that there is no issues with sign.
The logic also depends on whether your machine is little endian or big endian. For Little Endian machines, you need to put your least significant byte first and then go higher. Your logic is for Big Endian.
It's possible that you are having a count overflow. A better way would be to explicitly declare a uint64_t and use that.

Here's a bit of code I wrote for bytes to uint64_t on a little endian machine.

std::uint64_t bytesToUint64(std::uint8_t* b) {
    std::uint64_t msb = 0x0u;
    for (int i(0); i < 7; i++) {
        msb |= b[i];
        msb <<= 8;
    }
    msb |= b[7];

    return msb;
}

EDIT by OP (implemented Tip 1):

long readAndSkipLong(char*& b)
{
    std::uint64_t ret = 
        ((std::uint8_t)b[0] << 56) | 
        ((std::uint8_t)b[1] << 48) | 
        ((std::uint8_t)b[2] << 40) | 
        ((std::uint8_t)b[3] << 32) | 
        ((std::uint8_t)b[4] << 24) | 
        ((std::uint8_t)b[5] << 16) | 
        ((std::uint8_t)b[6] <<  8) | 
        ((std::uint8_t)b[7]);
    b+=8;

    return ret;
}

thanks, actually your code does not work for me but your tipp to use std::uint64_t and std::uint8_t I updated your answer with my working code for future references! — Anthea, Mar 23 '15 at 09:40
`(std::uint8_t)b[0] << 56` will **not** work because `(std::uint8_t)b[0]` will be promoted to int which doesn't contain 64 bits. And you're still returning `long` which is not a 64-bit type on Windows — phuclv, Sep 13 '18 at 04:33

char array to long results in unexpected value

3 Answers3