Bit shifting `char` vs. `unsigned char`

Question

I need to convert 2 bytes in char pcm[] to a 1 byte short pcm_[]. This post used a C-style cast, which at first I tried out in my C++ program (using Qt):

#include <QCoreApplication>

#include <QDebug>

int main(int argc, char *argv[])
{
    QCoreApplication a(argc, argv);

    char pcm[2] = {0xA1, 0x12};
    qDebug()<<pcm[0]<<pcm[1];

    short pcm_ = ( pcm[1] << 8 )| pcm[0];
    qDebug()<<pcm_;

    short pcm_2 =  ((unsigned char)(pcm[1])) << 8| (unsigned char) pcm[0];
    qDebug()<<pcm_2;

    return a.exec();
}

I figured out that it only works if I use unsigned char in the bit shifting, but do not understand, why this is necessary as the input is a char.

Moreover, I would like to use C++-style-cast, and came up with this one:

short pcm_3 = (static_cast<unsigned char>(pcm[1])) << 8|
               static_cast<unsigned char>(pcm[0]);
qDebug()<<pcm_3;

Again, I need to use unsigned char instead of char.

So I have 2 questions:

Is static_cast the right cast? In my mind is an example from somewhere that used a reinterpret_cast. However, the reinterpret cast does not work.
Why do I have to use unsigned char?

I think you should cast to `unsigned short` or `unsigned int`. Vlad from Moscow's answer explains why it also works (thanks to the usual arithmetic conversions) if casting to `unsigned char` but I find this code rather confusing. — 5gon12eder, Jul 19 '15 at 12:10
@TheParamagneticCroissant only `0x12` is shifted in this code so he (perhaps accidentally) sidestepped that one — M.M, Jul 19 '15 at 12:13
Aside: if you want an 8-bit type or a 16-bit type, you should use `uint8_t`/`int8_t` or `uint16_t`/`int16_t`. Aside #2: when you're working with bits, you should always use unsigned types unless you *really* understand the quirks that signed types. — , Jul 19 '15 at 12:21

Vlad from Moscow · Answer 1 · 2015-07-19T12:19:26.730

According to the C Standard (6.5.11 Bitwise exclusive OR operator)

3 The usual arithmetic conversions are performed on the operands

The same is written in the C++ Standard (5.13 Bitwise inclusive OR operator)

1 The usual arithmetic conversions are performed;

The usual arithmetic conversions include the integer promotions. This means that in this expression

( pcm[1] << 8 )| pcm[0];

operand pcm[0] is promoted to type int. If according to settings of your compiler type char behaves like type signed char then you get that value 0xA1 is promoted to signed int 0xFFFFFFA1 (provided that sizeof( int ) is equal to 4). That is the sign bit will be propogated.

Hence you will get an incorrect result. To avoid it you shoud cast type char to type unsigned char In this case the promoted value will look like 0x000000A1. In C++ it can be written like

static_cast<unsigned char>( pcm[0] )

score 0 · Answer 2 · answered Jul 19 '15 at 11:59

0

You have to use unsigned char because of the promotion to int with operator |

Assuming int is 32 bits:

signed char 0xA1 becomes int 0xFFFFFFA1 (to keep same value)
unsigned char 0xA1 becomes 0x000000A1.

answered Jul 19 '15 at 11:59

Jarod42

203,559
14
181
302

Sergey Kalinichenko · Answer 3 · 2015-07-19T12:25:42.290

The reason you need to cast char to unsigned char is that char is allowed to be a signed data type. In this case it would be sign-extended before performing the |, meaning that the lower half would become negative for chars with the most significant bit set to 1:

char c = 200;
int a = c | 0; // returns -56 on systems where char is signed

In this example using static_cast or the C cast is a matter of style. Many C++ shops stay away from C casts, because they are harder to find in the source code, while static_casts are much easier to spot.

score 0 · Answer 4 · edited May 23 '17 at 12:06

0

You shall cast data type to unsigned, because when you "expand" a signed character to signed short it's 7th bit gets replicated to bits 8-15 of the short. So, from A1 which is 10100001 you get 1111111110100001.
According to this question and answer, reinterpret_cast is the last cast you should think of.

edited May 23 '17 at 12:06

Community

1
1

answered Jul 19 '15 at 12:07

nsilent22

2,763
10
14

score 0 · Answer 5 · answered Jul 19 '15 at 12:14

0

The problem starts here:

char pcm[2] = {0xA1, 0x12};

On your system, char is signed, and has a range of -128 through 127. You try to assign 161 to a char. This is out of range.

In C and C++ the result of out-of-range assignment is implementation-defined. Usually, the compiler decides to go with the char with the same representation, which is -95 .

Then you promote this to int (by virtue of using it as operand of |), giving the int value -95 which has a representation starting with lots of 1 bits.

If you actually want to work with the value 161 you will need to use a data type that can hold that value, such as unsigned char. The simplest way is to make pcm[] have that type, rather than using casts.

answered Jul 19 '15 at 12:14

M.M

138,810
21
208
365

Thanks for clarifying. Unfortunately I can't directly use an unsigned type, because I need this for audio recording, and the buffer (a subclass of `QIODevice`) that is used by `QAudioInput`, needs to reimplement the function `qint64 write(const char * data, qint64 maxSize)` that takes a `char`. – user2366975 Jul 19 '15 at 15:36
@user2366975 you can use `unsigned char` for your buffer, and then reinterpret-cast to `char *` when calling that function – M.M Jul 19 '15 at 22:39

Bit shifting `char` vs. `unsigned char`

5 Answers5