2

For example:

    unsigned char mask1 = 0x55; //01010101
    unsigned short int mask2 = 0x8055;//1000000001010101   
    unsigned short int res = 0; //0000000000000000
    res = mask1 | mask2;

so what is res now in bits?

does it convert mask1 from 2 bytes to 4? and the "empty spaces" will be zeroes?

I mean, in logical terms it will work like that?

res = mask1 | mask2 = 01010101 | 1000000001010101 

  0000000001010101
| 1000000001010101
  ----------------
  1000000001010101
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
Pentakorr
  • 25
  • 3
  • _"and the "empty spaces" will be 0s? "_: what else do you expect? – Jabberwocky Jun 22 '23 at 09:07
  • @Jabberwocky: in this case the behavior is obvious, but what if `mask1` was defined as `char mask1 = 0x88;` ? – chqrlie Jun 22 '23 at 09:17
  • @chqrlie that's not quite the same question, with `char ...` we'd have sign extension issues, but fair enough. – Jabberwocky Jun 22 '23 at 09:41
  • 1
    "does it convert mask1 from 2 bytes to 4?" That may happen during calculating the result of bitwise OR operation but when that result is assigned to `res` only 2 bytes are kept and the upper part is chopped. – Gerhardh Jun 22 '23 at 09:59

3 Answers3

3

The both operands of the expression

mask1|mask2;

are converted to the type int (or unsigned int if the type int is unable to represent all values of the operands) due to the integer promotions preserving values stored in the operands.

From the C Standard (6.5.12 Bitwise inclusive OR operator)

3 The usual arithmetic conversions are performed on the operands.

and

and (6.3.1.8 Usual arithmetic conversions)

1 Many operators that expect operands of arithmetic type cause conversions and yield result types in a similar way. The purpose is to determine a common real type for the operands and result....

This pattern is called the usual arithmetic conversions: Otherwise (note: if neither operand is of a real type - added by me), the integer promotions are performed on both operands

and (6.3.1.1 Boolean, characters, and integers)

If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.

So for example the value 0x55 stored in an object of the type unsigned char internally will be represented in an object of the type int like 0x00000055 provided that the sizeof( int ) is equal to 4. And the value 0x8055 stored in an object of the type unsigned short will be represented internally like 0x00008055

In this assignment

res= mask1|mask2;

the result will be converted back to the type unsigned short.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • 3
    Technically, the integer promotion causes both values to convert to `int` if type `int` is larger than type `short`, otherwise both values are converted to `unsigned int`, but in this particular case, the result is the same. – chqrlie Jun 22 '23 at 09:11
3

This answer will assume a mainstream system with 8 bit character types, 16 bit short and 32 bit int (this is the case for all mainstream 32 and 64 bitters in the real world).

First check out Implicit type promotion rules. How this works in this particular case:

Each operand in C comes with it's own little line stating how implicit promotions are handled. In case of |, we can peek at C17 6.5.12 "the bitwise inclusive OR operator":

Constraints

Each of the operands shall have integer type.

Semantics
The usual arithmetic conversions are performed on the operands.

As we learned from the link at the top of this post, the integer promotions are part of the usual arithmetic conversions. So in the expression res = mask1 | mask2;, both operands are small integer types and therefore promoted to int which is signed. Which is a bit unfortunate since we want to avoid bitwise arithmetic using signed operands like the plague, though in this specific case it makes no difference. Instead of 0x8055 and 0x55 we will get 0x00008055 and 0x00000055 - basically just zero padding.

Thus it is 100% equivalent to res = (int)mask1 | (int)mask2; and the result of mask1 | mask2 is of type int.


Next up this is stored in res which is of type unsigned short. What happens then is "conversion during assignment", 6.5.16:

In simple assignment (=), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand.

The specific rules for what this conversion entails is found in C17 6.3.1.3:

Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type

This conversion works like modulus, or if you will a binary truncation of the raw value where the most significant bytes are simply discarded.

In this specific case we have an int with value 0x00008055 and it is converted to an unsigned short with value 0x8055.


A curious note regarding "conversion during assignment" is that is also happens on all of these lines:

unsigned char mask1 = 0x55; //01010101
unsigned short int mask2 = 0x8055;//1000000001010101   
unsigned short int res = 0;

The numbers here, 0x55 and so forth, are formally called integer constants. Integer constants in C have a type picked based on various intricate rules (C17 6.4.4.1) - I won't mention them here but for now we can note that an integer constant can never be of a smaller type than int. So during all of the above initializations, we have implicit conversion from int to the type of the left operand.

Lundin
  • 195,001
  • 40
  • 254
  • 396
0

what happens .. compare unsigned short int and unsigned char ...?

OP mostly has it.

Usual promotions

Operators like |, ^, &, + , - and others first promote each object that is narrower than int/unsigned to int/unsigned with no change in value. If int encompasses the narrow type range, the object becomes an int, otherwise it becomes an unsigned.

In OP's case, the unsigned short int likely promotes to int (or possibly unsigned if unsigned is 16-bit). The unsigned char certainly becomes an int.

Conversion to common type

The lower ranked object is then convert to the same as the higher ranked one. This may involve a value change as a negative int being converted to an unsigned or some integers converted to floating point.

In OP's case, the 2 operands are then int (or possibly unsigned if unsigned is 16-bit). With OP's values, no value change occurs in this step.

Operator | applied

mask1 | mask2 then does almost as OP supposes as 2 ints.

  0b00000000`00000000`00000000`01010101
| 0b00000000`00000000`10000000`01010101
  -------------------------------------
  0b00000000`00000000`10000000`01010101

Or with 16-bit int/unsigned as 2 unsigned.

  0b00000000`01010101
| 0b10000000`01010101
  -------------------
  0b10000000`01010101

Assignment narrows the type

The int (or unsigned) result is then converted to unsigned short and then assigned.

When the value is representable in the new narrow type, that is the value saved. Otherwise if the destination type is a signed integer, the value is converted in an implementation defined manner. The most common implementation defined manner simply uses the least significant bits. Otherwise if the destination type is an unsigned integer, the least significant bits are used (i.e. "mod" (max value + 1)).

In OP's case, the result of mask1 | mask2 results in a value in the unsigned short range and so 0b1000000001010101 is saved.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256