24-bit to 32-bit conversion in C++

Question

I need to convert a 24-bit integer (2s compliment) to 32-bit integer in C++. I have found a solution here, which is given as

int interpret24bitAsInt32(unsigned char* byteArray)
 {     
    return (  
        (byteArray[0] << 24)
    |   (byteArray[1] << 16)
    |   (byteArray[2] << 8)
    ) >> 8;  
}

Though I found it is working, I have the following concern about the piece of code. byteArray[0] is only 8-bits, and hence how the operations like byteArray[0] << 24 will be possible? It will be possible if the compiler up-converts the byteArray to an integer and does the operation. This may be the reason it is working now. But my question is whether this behaviour is guaranteed in all compilers and explicitly mentioned in the standard? It is not trivial to me as we are not explicitly giving the compiler any clue that the target is a 32-bit integer!

Also, please let me know any improvisation like vectorization is possible to improve the speed (may be using C++11), as I need to convert huge amount of 24-bit data to 32-bit.

Is [this](https://stackoverflow.com/questions/30473958/what-is-going-on-with-bitwise-operators-and-integer-promotion) helpful? — ChrisD, Dec 15 '19 at 17:15
Note that the code you presented above is not C++ since `byte[] byteArray` is not valid C++ syntax. Is this really about C++ or maybe some other language!? What type is `byte`? — Michael Kenzel, Dec 15 '19 at 17:19
`byteArray[0] << 24` invokes undefined behavior, if `byteArray[0] > 127`, because it overflows. — mch, Dec 15 '19 at 17:21
@MichaelKenzel: there is [std::byte](https://en.cppreference.com/w/cpp/types/byte) (C++17) — Jarod42, Dec 15 '19 at 17:22
`char *` is even worse than the syntax error with `[]` between type and name. — mch, Dec 15 '19 at 17:23
If you're *just* worried about the 32-bitness of the target `int` then you can use `size_t s = sizeof(int)` and change the `<<` values accordingly. If you're also concerned about a byte *not* being 8 bits (which *is* allowed) then you'll also need to do some tricks with `CHAR_BIT` (defined in `limits.h`). — Adrian Mole, Dec 15 '19 at 17:25
@AdrianMole I am worried how the compiler can up-convert unsigned char to int. — Soo, Dec 15 '19 at 17:30
Are you asking about the how? By copying the 8bit value into an 32bit register. Or are you asking about the why or if it is mandatory? — mch, Dec 15 '19 at 17:33
I'm not 100% sure, but I think the `<<` operator *promotes* its arguments before its does the shift. Others here will be more certain. Certainly, the `24` literal is an `int` ***not*** anything shorter. — Adrian Mole, Dec 15 '19 at 17:37

mch · Answer 1 · 2019-12-15T19:16:34.273

1

int32_t interpret24bitAsInt32(unsigned char* byteArray)
{     
    int32_t number =
        (((int32_t)byteArray[0]) << 16)
    |   (((int32_t)byteArray[1]) << 8)
    |   byteArray[2];
    if (number >= ((int32_t)1) << 23)
        //return (uint32_t)number | 0xFF000000u;
        return number - 16777216;
    return number;
}

this function should do what you want without invoking undefined behavior by shifting a 1 into the sign bit of int.
The int32_t cast is only necessary if sizeof(int) < 4, otherwise the default integer promotion to int happens.

If someone does not like the if: It does not get translated to a conditional jump by the compiler (gcc 9.2): https://godbolt.org/z/JDnJM2
It leaves a cmovg.

edited Dec 15 '19 at 19:16

answered Dec 15 '19 at 17:57

mch

9,424
2
28
42

You still have undefined behavior in your uint32_t to int32_t conversion in your if statement. – afic Dec 15 '19 at 18:23
@afic Implementation defined behaviour I think – Daniel Jour Dec 15 '19 at 18:44
1

@afic It is implementation defined, but I found a better solution, simply subtract `1 << 24` (the 24bit number worth sign bit). – mch Dec 15 '19 at 18:45

Michael Kenzel · Answer 2 · 2019-12-15T19:39:03.467

Integral promotions [conv.prom] are performed on the operands of a shift expression [expr.shift]/1. In your case, that means that your values of type unsigned char will be converted to type int before << is applied [conv.prom]/1. Thus, the C++ standard guarantees that the operands be "up-converted".

However, the standard only guarantees that int has at least 16 Bit. There is also no guarantee that unsigned char has exactly 8 Bit (it may have more). Thus, it is not guaranteed that int is always large enough to represent the result of these left shifts. If int does not happen to be large enough, the resulting signed integer overflow will invoke undefined behavior [expr]/4. Chances are that int has 32 Bit on your target platform and, thus, everything works out in the end.

If you need to work with a guaranteed, fixed number of Bits, I would generally recommend to use fixed-width integer types, for example:

std::int32_t interpret24bitAsInt32(const std::uint8_t* byteArray)
{     
    return
        static_cast<std::int32_t>(
            (std::uint32_t(byteArray[0]) << 24) | 
            (std::uint32_t(byteArray[1]) << 16) | 
            (std::uint32_t(byteArray[2]) <<  8)
        ) >> 8;
}

Note that right shift of a negative value is currently implementation-defined [expr.shift]/3. Thus, it is not strictly guaranteed that this code will end up performing sign extension on a negative number. However, your compiler is required to document what exactly right-shifting a negative integer does [defns.impl.defined] (i.e., you can go and make sure it does what you need). And I have never heard of a compiler that does not implement right shift of a negative value as an arithmetic shift in practice. Also, it looks like C++20 is going to mandate arithmetic shift behavior…

`(std::int32_t(byteArray[0]) << 24` is undefined if `byteArray[0] > 127`, it will shift a `1` into the sign bit of `int32_t`. — mch, Dec 15 '19 at 17:58

Igor Tandetnik · Answer 3 · 2019-12-15T18:11:38.983

[expr.shift]/1 The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand...

[conv.prom] 7.6 Integral promotions

1 A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank (7.15) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.

So yes, the standard requires that an argument of a shift operator, that has the type unsigned char, be promoted to int before the evaluation.

That said, the technique in your code relies on int a) being 32 bits large, and b) using two's-complement to represent negative values. Neither of which is guaranteed by the standard, though it's common with modern systems.

Daniel Jour · Answer 4 · 2019-12-15T19:29:09.570

A version without branch; but multiplication:

int32_t interpret24bitAsInt32(unsigned char* bytes) {
  unsigned char msb = UINT8_C(0xFF) * (bytes[0] >> UINT8_C(7));
  uint32_t number =
        (msb << UINT32_C(24))
      | (bytes[0] << UINT32_C(16)))
      | (bytes[1] << UINT32_C(8)))
      |  bytes[2];
  return number;
}

You need to test if omitting the branch really gives you a performance advantage, though!

Adapted from older code of me which did this for 10 bit numbers. Test before use!

Oh, and it still relies upon implementation defined behaviour with regards to the conversion uint32_t to int32_t. If you want to go down that rabbit hole, have fun but be warned.

Or, much more simple: Use the trick from mchs answer. And also use shifts instead of multiplication:

int32_t interpret24bitAsInt32(unsigned char* bytes) {
  int32_t const number =
        (bytes[0] << INT32_C(16))
      | (bytes[1] << INT32_C(8))
      |  bytes[2];
  int32_t const correction = 
     (bytes[0] >> UINT8_C(7)) << INT32_C(24);
  return number - correction;
}

Test case

score 0 · Answer 5 · answered Dec 15 '19 at 19:35

There is indeed Integral_promotion for type smaller than int for operator_arithmetic

So assuming sizeof(char) < sizeof(int)

in

byteArray[0] << 24

byteArray is promoted in int and you do bit-shift on int.

First issue is that int can only be 16 bits.

Second issue (before C++20), int is signed, and Bitwise shift can easily lead to implementation-defined or UB (And you have both for negative 24 bits numbers).

In C++20, behavior of Bitwise shift has been simplified (behavior defined) and the problematic UB has been removed too.

The leading 1 of negative number are kept in neg >> 8.

So before C++20, you have to do something like:

std::int32_t interpret24bitAsInt32(const unsigned char* byteArray)
{
    const std::int32_t res =
        (std::int32_t(byteArray[0]) << 16)
      | (byteArray[1] << 8)
      | byteArray[2];
    const std::int32_t int24Max = (std::int32_t(1) << 24) - 1;
    return res <= int24Max ?
               res : // Positive 24 bit numbers
               int24Max - res; // Negative number
}

24-bit to 32-bit conversion in C++

5 Answers5