2

I need to convert a 24-bit integer (2s compliment) to 32-bit integer in C++. I have found a solution here, which is given as

int interpret24bitAsInt32(unsigned char* byteArray)
 {     
    return (  
        (byteArray[0] << 24)
    |   (byteArray[1] << 16)
    |   (byteArray[2] << 8)
    ) >> 8;  
}

Though I found it is working, I have the following concern about the piece of code. byteArray[0] is only 8-bits, and hence how the operations like byteArray[0] << 24 will be possible? It will be possible if the compiler up-converts the byteArray to an integer and does the operation. This may be the reason it is working now. But my question is whether this behaviour is guaranteed in all compilers and explicitly mentioned in the standard? It is not trivial to me as we are not explicitly giving the compiler any clue that the target is a 32-bit integer!

Also, please let me know any improvisation like vectorization is possible to improve the speed (may be using C++11), as I need to convert huge amount of 24-bit data to 32-bit.

Soo
  • 885
  • 7
  • 26
  • Is [this](https://stackoverflow.com/questions/30473958/what-is-going-on-with-bitwise-operators-and-integer-promotion) helpful? – ChrisD Dec 15 '19 at 17:15
  • 1
    Note that the code you presented above is not C++ since `byte[] byteArray` is not valid C++ syntax. Is this really about C++ or maybe some other language!? What type is `byte`? – Michael Kenzel Dec 15 '19 at 17:19
  • 1
    `sizeof(int)` is not necessary 4 though... – Jarod42 Dec 15 '19 at 17:20
  • `byteArray[0] << 24` invokes undefined behavior, if `byteArray[0] > 127`, because it overflows. – mch Dec 15 '19 at 17:21
  • 2
    @MichaelKenzel: there is [std::byte](https://en.cppreference.com/w/cpp/types/byte) (C++17) – Jarod42 Dec 15 '19 at 17:22
  • @MichaelKenzel I have changed byte to unsigned char – Soo Dec 15 '19 at 17:22
  • `char *` is even worse than the syntax error with `[]` between type and name. – mch Dec 15 '19 at 17:23
  • 1
    @Soo no, you changed it to `char`, not to `unsigned char`. – mch Dec 15 '19 at 17:23
  • If you're *just* worried about the 32-bitness of the target `int` then you can use `size_t s = sizeof(int)` and change the `<<` values accordingly. If you're also concerned about a byte *not* being 8 bits (which *is* allowed) then you'll also need to do some tricks with `CHAR_BIT` (defined in `limits.h`). – Adrian Mole Dec 15 '19 at 17:25
  • @AdrianMole I am worried how the compiler can up-convert unsigned char to int. – Soo Dec 15 '19 at 17:30
  • Are you asking about the how? By copying the 8bit value into an 32bit register. Or are you asking about the why or if it is mandatory? – mch Dec 15 '19 at 17:33
  • 1
    I'm not 100% sure, but I think the `<<` operator *promotes* its arguments before its does the shift. Others here will be more certain. Certainly, the `24` literal is an `int` ***not*** anything shorter. – Adrian Mole Dec 15 '19 at 17:37

5 Answers5

1
int32_t interpret24bitAsInt32(unsigned char* byteArray)
{     
    int32_t number =
        (((int32_t)byteArray[0]) << 16)
    |   (((int32_t)byteArray[1]) << 8)
    |   byteArray[2];
    if (number >= ((int32_t)1) << 23)
        //return (uint32_t)number | 0xFF000000u;
        return number - 16777216;
    return number;
}

this function should do what you want without invoking undefined behavior by shifting a 1 into the sign bit of int.
The int32_t cast is only necessary if sizeof(int) < 4, otherwise the default integer promotion to int happens.

If someone does not like the if: It does not get translated to a conditional jump by the compiler (gcc 9.2): https://godbolt.org/z/JDnJM2
It leaves a cmovg.

mch
  • 9,424
  • 2
  • 28
  • 42
0

Integral promotions [conv.prom] are performed on the operands of a shift expression [expr.shift]/1. In your case, that means that your values of type unsigned char will be converted to type int before << is applied [conv.prom]/1. Thus, the C++ standard guarantees that the operands be "up-converted".

However, the standard only guarantees that int has at least 16 Bit. There is also no guarantee that unsigned char has exactly 8 Bit (it may have more). Thus, it is not guaranteed that int is always large enough to represent the result of these left shifts. If int does not happen to be large enough, the resulting signed integer overflow will invoke undefined behavior [expr]/4. Chances are that int has 32 Bit on your target platform and, thus, everything works out in the end.

If you need to work with a guaranteed, fixed number of Bits, I would generally recommend to use fixed-width integer types, for example:

std::int32_t interpret24bitAsInt32(const std::uint8_t* byteArray)
{     
    return
        static_cast<std::int32_t>(
            (std::uint32_t(byteArray[0]) << 24) | 
            (std::uint32_t(byteArray[1]) << 16) | 
            (std::uint32_t(byteArray[2]) <<  8)
        ) >> 8;
}

Note that right shift of a negative value is currently implementation-defined [expr.shift]/3. Thus, it is not strictly guaranteed that this code will end up performing sign extension on a negative number. However, your compiler is required to document what exactly right-shifting a negative integer does [defns.impl.defined] (i.e., you can go and make sure it does what you need). And I have never heard of a compiler that does not implement right shift of a negative value as an arithmetic shift in practice. Also, it looks like C++20 is going to mandate arithmetic shift behavior…

Michael Kenzel
  • 15,508
  • 2
  • 30
  • 39
  • 1
    `(std::int32_t(byteArray[0]) << 24` is undefined if `byteArray[0] > 127`, it will shift a `1` into the sign bit of `int32_t`. – mch Dec 15 '19 at 17:58
0

[expr.shift]/1 The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand...

[conv.prom] 7.6 Integral promotions

1 A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank (7.15) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.

So yes, the standard requires that an argument of a shift operator, that has the type unsigned char, be promoted to int before the evaluation.


That said, the technique in your code relies on int a) being 32 bits large, and b) using two's-complement to represent negative values. Neither of which is guaranteed by the standard, though it's common with modern systems.

Igor Tandetnik
  • 50,461
  • 4
  • 56
  • 85
0

A version without branch; but multiplication:

int32_t interpret24bitAsInt32(unsigned char* bytes) {
  unsigned char msb = UINT8_C(0xFF) * (bytes[0] >> UINT8_C(7));
  uint32_t number =
        (msb << UINT32_C(24))
      | (bytes[0] << UINT32_C(16)))
      | (bytes[1] << UINT32_C(8)))
      |  bytes[2];
  return number;
}

You need to test if omitting the branch really gives you a performance advantage, though!

Adapted from older code of me which did this for 10 bit numbers. Test before use!

Oh, and it still relies upon implementation defined behaviour with regards to the conversion uint32_t to int32_t. If you want to go down that rabbit hole, have fun but be warned.

Or, much more simple: Use the trick from mchs answer. And also use shifts instead of multiplication:

int32_t interpret24bitAsInt32(unsigned char* bytes) {
  int32_t const number =
        (bytes[0] << INT32_C(16))
      | (bytes[1] << INT32_C(8))
      |  bytes[2];
  int32_t const correction = 
     (bytes[0] >> UINT8_C(7)) << INT32_C(24);
  return number - correction;
}

Test case

Daniel Jour
  • 15,896
  • 2
  • 36
  • 63
0

There is indeed Integral_promotion for type smaller than int for operator_arithmetic

So assuming sizeof(char) < sizeof(int)

in

byteArray[0] << 24

byteArray is promoted in int and you do bit-shift on int.

First issue is that int can only be 16 bits.

Second issue (before C++20), int is signed, and Bitwise shift can easily lead to implementation-defined or UB (And you have both for negative 24 bits numbers).

In C++20, behavior of Bitwise shift has been simplified (behavior defined) and the problematic UB has been removed too.

The leading 1 of negative number are kept in neg >> 8.

So before C++20, you have to do something like:

std::int32_t interpret24bitAsInt32(const unsigned char* byteArray)
{
    const std::int32_t res =
        (std::int32_t(byteArray[0]) << 16)
      | (byteArray[1] << 8)
      | byteArray[2];
    const std::int32_t int24Max = (std::int32_t(1) << 24) - 1;
    return res <= int24Max ?
               res : // Positive 24 bit numbers
               int24Max - res; // Negative number
}
Jarod42
  • 203,559
  • 14
  • 181
  • 302