4

I can't figure out why does inclusive OR return wrong result.

char arr[] = { 0x0a, 0xc0 };
uint16_t n{};

n = arr[0]; // I get 0x000a here.
n = n << 8; // Shift to the left and get 0x0a00 here.
n = n | arr[1]; // But now the n value is 0xffc0 instead of 0x0ac0.

What is the mistake in this example? Console app, MVS Community 2017.

Konstantin
  • 51
  • 5
  • 2
    Please, try `n = n | (unsigned char)arr[1];`. I guess, `0xff` is caused by [sign bit extension](https://en.wikipedia.org/wiki/Sign_extension) while converting `char` to `int`. – Scheff's Cat May 07 '18 at 05:44
  • 2
    `arr[1] > 127` so it is a negative `char` value that is *sign-extended* on type promotion along with `uint16_t` in `n | arr[1]`. – David C. Rankin May 07 '18 at 05:46
  • 2
    Or change types, e.g. `unsigned char arr[] = { 0x0a, 0xc0 };` – David C. Rankin May 07 '18 at 05:52
  • General rule: avoid signed types when working with bitwise operations (and `char` signedness is implementation-defined); way too much stuff becomes implementation-defined (and some even undefined). – Matteo Italia May 07 '18 at 06:07
  • Your latest public working draft is [C++ Standard - 7.6 Integral Promotions (working draft n4741)](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/n4741.pdf#section.7.6) (**note:** earlier drafts include it as section 4.5) – David C. Rankin May 07 '18 at 06:24

3 Answers3

7

The unintended 0xff is caused by sign bit extension of 0xc0.

0xc0 = 0b11000000

Hence, the uppermost bit is set which means sign for char (as signed char).

Please, note that all arithmetic and bitwise operations in C++ work with at least int (or unsigned int). Smaller types are promoted before and clipped afterwards.

Please, note also that char may be signed or unsigned. That's compiler implementation dependent. Obviously, it's signed in the case of OP. To prevent the unintended sign extension, the argument has to become unsigned (early enough).

Demonstration:

#include <iostream>

int main()
{
  char arr[] = { '\x0a', '\xc0' };
  uint16_t n{};

  n = arr[0]; // I get 0x000a here.
  n = n << 8; // Shift to the left and get 0x0a00 here.
  n = n | arr[1]; // But now the n value is 0xffc0 instead of 0x0ac0.
  std::cout << std::hex << "n (wrong): " << n << std::endl;
  n = arr[0]; // I get 0x000a here.
  n = n << 8; // Shift to the left and get 0x0a00 here.
  n = n | (unsigned char)arr[1]; // (unsigned char) prevents sign extension
  std::cout << std::hex << "n (right): " << n << std::endl;
  return 0;

}

Session:

g++ -std=c++11 -O2 -Wall -pthread main.cpp && ./a.out
n (wrong): ffc0
n (right): ac0

Life demo on coliru

Note:

I had to change
char arr[] = { 0x0a, 0xc0 };
to
char arr[] = { '\x0a', '\xc0' };
to come around serious compiler complaints. I guess, these complaints where strongly related to this issue.

Scheff's Cat
  • 19,528
  • 6
  • 28
  • 56
  • This is the correct answer, though it would probably be easier to declare `arr` as `unsigned char` entirely instead of casting the values in it every time – Aemyl May 07 '18 at 06:06
  • @Aemyl Yepp, agree but there might be a (separate) reason why it is `char`. – Scheff's Cat May 07 '18 at 06:07
  • Well, that's true – Aemyl May 07 '18 at 06:09
  • @Aemyl If data storage is done by `std::string` then you need the `(unsigned char)` trick. I often needed it e.g. with macros when ASCIIs above 127 are uninteresting but might kill you app otherwise. – Scheff's Cat May 07 '18 at 06:10
  • @FrankAK AFAIK, it initializes `n` with 0. If in doubt, `uint16_t n = 0;` would work as well. (Please, note the implicit casting of `(int)0` to `uint16_t`.) ;-) – Scheff's Cat May 07 '18 at 06:13
  • I can't compile it on G++ , got `error: expected ';' at end of declaration` error. – Frank AK May 07 '18 at 06:15
0

You have fallen a victim to signed integer promotion.

When assigning 0xc0 to the second element (signed char default because of MVS) in the array, this is represented as follows:

arr[1] = 1100 - 0000, or in decimal -64

When this is cast to an uint16_t, it is promoted to an integer with the value -64. This is:

n = 1111 - 1111 - 1100 - 0000 = -64  

due to the 2's complement implementation of integers.

Therefore:

n          = 1111 - 1111 - 1100 - 0000 
arr[1]     = 0000 - 0000 - 1010 - 0000 (after being promoted)

n | arr[1] = 1111 - 1111 -1110-0000 = 0xffc0
Kostas
  • 4,061
  • 1
  • 14
  • 32
0

I got it to work correctly by doing:

int arr[] = { 0x0a, 0xc0 };
int n{};

n = arr[0]; // I get 0x000a here.
n = n << 8; // Shift to the left and get 0x0a00 here.
n = n | arr[1];
std::cout << n << std::endl;

There was some truncation if you leave the 'arr' array as char.

nemo
  • 73
  • 1
  • 9