Wrong result with bitwise inclusive OR

Question

I can't figure out why does inclusive OR return wrong result.

char arr[] = { 0x0a, 0xc0 };
uint16_t n{};

n = arr[0]; // I get 0x000a here.
n = n << 8; // Shift to the left and get 0x0a00 here.
n = n | arr[1]; // But now the n value is 0xffc0 instead of 0x0ac0.

What is the mistake in this example? Console app, MVS Community 2017.

Please, try `n = n | (unsigned char)arr[1];`. I guess, `0xff` is caused by [sign bit extension](https://en.wikipedia.org/wiki/Sign_extension) while converting `char` to `int`. — Scheff's Cat, May 07 '18 at 05:44
`arr[1] > 127` so it is a negative `char` value that is *sign-extended* on type promotion along with `uint16_t` in `n | arr[1]`. — David C. Rankin, May 07 '18 at 05:46
Or change types, e.g. `unsigned char arr[] = { 0x0a, 0xc0 };` — David C. Rankin, May 07 '18 at 05:52
General rule: avoid signed types when working with bitwise operations (and `char` signedness is implementation-defined); way too much stuff becomes implementation-defined (and some even undefined). — Matteo Italia, May 07 '18 at 06:07
Your latest public working draft is [C++ Standard - 7.6 Integral Promotions (working draft n4741)](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/n4741.pdf#section.7.6) (**note:** earlier drafts include it as section 4.5) — David C. Rankin, May 07 '18 at 06:24

Scheff's Cat · Accepted Answer · 2018-05-07T06:20:17.370

7

The unintended 0xff is caused by sign bit extension of 0xc0.

0xc0 = 0b11000000

Hence, the uppermost bit is set which means sign for char (as signed char).

Please, note that all arithmetic and bitwise operations in C++ work with at least int (or unsigned int). Smaller types are promoted before and clipped afterwards.

Please, note also that char may be signed or unsigned. That's compiler implementation dependent. Obviously, it's signed in the case of OP. To prevent the unintended sign extension, the argument has to become unsigned (early enough).

Demonstration:

#include <iostream>

int main()
{
  char arr[] = { '\x0a', '\xc0' };
  uint16_t n{};

  n = arr[0]; // I get 0x000a here.
  n = n << 8; // Shift to the left and get 0x0a00 here.
  n = n | arr[1]; // But now the n value is 0xffc0 instead of 0x0ac0.
  std::cout << std::hex << "n (wrong): " << n << std::endl;
  n = arr[0]; // I get 0x000a here.
  n = n << 8; // Shift to the left and get 0x0a00 here.
  n = n | (unsigned char)arr[1]; // (unsigned char) prevents sign extension
  std::cout << std::hex << "n (right): " << n << std::endl;
  return 0;

}

Session:

g++ -std=c++11 -O2 -Wall -pthread main.cpp && ./a.out
n (wrong): ffc0
n (right): ac0

Life demo on coliru

Note:

I had to change
char arr[] = { 0x0a, 0xc0 };
to
char arr[] = { '\x0a', '\xc0' };
to come around serious compiler complaints. I guess, these complaints where strongly related to this issue.

edited May 07 '18 at 06:20

answered May 07 '18 at 06:01

Scheff's Cat

19,528
6
28
56

This is the correct answer, though it would probably be easier to declare `arr` as `unsigned char` entirely instead of casting the values in it every time – Aemyl May 07 '18 at 06:06
@Aemyl Yepp, agree but there might be a (separate) reason why it is `char`. – Scheff's Cat May 07 '18 at 06:07
Well, that's true – Aemyl May 07 '18 at 06:09
@Aemyl If data storage is done by `std::string` then you need the `(unsigned char)` trick. I often needed it e.g. with macros when ASCIIs above 127 are uninteresting but might kill you app otherwise. – Scheff's Cat May 07 '18 at 06:10
@FrankAK AFAIK, it initializes `n` with 0. If in doubt, `uint16_t n = 0;` would work as well. (Please, note the implicit casting of `(int)0` to `uint16_t`.) ;-) – Scheff's Cat May 07 '18 at 06:13
I can't compile it on G++ , got `error: expected ';' at end of declaration` error. – Frank AK May 07 '18 at 06:15

Kostas · Answer 2 · 2018-05-07T06:13:43.837

You have fallen a victim to signed integer promotion.

When assigning 0xc0 to the second element (signed char default because of MVS) in the array, this is represented as follows:

arr[1] = 1100 - 0000, or in decimal -64

When this is cast to an uint16_t, it is promoted to an integer with the value -64. This is:

n = 1111 - 1111 - 1100 - 0000 = -64

due to the 2's complement implementation of integers.

Therefore:

n          = 1111 - 1111 - 1100 - 0000 
arr[1]     = 0000 - 0000 - 1010 - 0000 (after being promoted)

n | arr[1] = 1111 - 1111 -1110-0000 = 0xffc0

score 0 · Answer 3 · answered May 07 '18 at 05:51

0

I got it to work correctly by doing:

int arr[] = { 0x0a, 0xc0 };
int n{};

n = arr[0]; // I get 0x000a here.
n = n << 8; // Shift to the left and get 0x0a00 here.
n = n | arr[1];
std::cout << n << std::endl;

There was some truncation if you leave the 'arr' array as char.

answered May 07 '18 at 05:51

nemo

73
1
9

2

Wouldn't `unsigned char arr[] = { 0x0a, 0xc0 };` be easier? – David C. Rankin May 07 '18 at 05:52

Wrong result with bitwise inclusive OR

3 Answers3