How does the compiler determine how many bits to invert when using the ~ operator?

Question

Something I'm always wondering when reading code like ~_BV(PB1) or ~1 is how the compiler knows how many bits to invert.

For example: ~1 is a completely different value if it is treated as a 8, 16 or 32 (unsigned) bit value: 254, 65534 and 4294967294.

So in the case of (let's say):

DDRB &= ~_BV(PB1);

With DDRB being a volatile unsigned char (i.e. 8 bits wide) are the upper bits just chopped of if the result is bigger than 8 bits?

And is there a way to tell the compiler how many bits we want to invert (what the bit width of the number is) with the ~ operator?

Note there is a question about the ~ operator here but the answers are not satisfactory. They either explain the concept or use a fixed bit width number as an example.

~ is a _unary_ operator. In `DDRB &= ~_BV(PB1);` , DDRB is _not_ an operand of ~, so its type is irrelevant to how many bits ~ flips. — Avi Berger, May 18 '23 at 17:48
@AviBerger Ah, thanks, that makes sense. I guess that means that if the number is bigger than 8 bits, it doesn't have an effect on the AND operation since the value is 1) out of range 2) a no-op for AND (1 & 1 = 1, 1 & 0 = 0), correct? — Marco, May 18 '23 at 17:51
[" the operator ~ performs integer promotions on its only operand."](https://en.cppreference.com/w/c/language/operator_arithmetic) and integer promotions are defined [here.](https://en.cppreference.com/w/c/language/conversion#Integer_promotions) . That will determine how many bits get flipped. Then similar rules (that you can find in the same references) come into play when you use the result as an operand to a binary operator. Got aspirin? — Avi Berger, May 18 '23 at 17:58

John Bollinger · Accepted Answer · 2023-05-18T18:07:07.873

The operand of the ~ operator is subject to the integer promotions, so if it is narrower than an int then it will be widened, in value-preserving fashion, to type int. All the bits of the promoted value are flipped, where the number of bits is a function of the data type.

If the operand is a constant then its (pre-promotion) type is determined by its lexical form. For example, 1 is an int, 1L is a long, and 1ULL is an unsigned long long. C does not provide integer constants narrower than type int.

So in the case of (let's say):
DDRB &= ~_BV(PB1);
With DDRB being a volatile unsigned char (i.e. 8 bits wide)

This seems to be tangential to the question, because DDRB is not an operand of the ~ operator, and you haven't provided any information about the thing that is.

are the upper bits just chopped of if the result is bigger than 8 bits?

I guess you are asking there about assignment operators, not ~. In a valid assignment expression, the value of the right-hand operand of the assignment operator is converted to the type of the left-hand operand. If the value cannot be represented by the target type then the effect depends on the target type, and it might not be defined at all. For an unsigned integer type, yes, the upper bits are "chopped off", though the language spec describes that in different terms.

And is there a way to tell the compiler how many bits we want to invert (what the bit width of the number is) with the ~ operator?

See above.

I'm sorry about the pointer confusion, the macro results in a pointer dereference so the type is `volatile unsigned char` and not `volatile unsigned char *`. I have updated my question accordingly. That is, `DDRB` is defined as `(*(volatile unsigned char *)some_fixed_addr)`. — Marco, May 18 '23 at 18:04
_If the value cannot be represented by the target type then the effect depends on the target type, and it might not be defined at all._ That's why I'm always a bit confused when people prefer to write `~something` instead of, say, `something ^ 0xFFu` where it's clear how many bits get inverted. But after what you said, I presume that this is moot, since `0xFFu` would also be an `unsigned int` type, correct? — Marco, May 18 '23 at 18:10
Yes, @Marco, `0xFFu` is an `unsigned int`. It is certainly possible to get yourself in trouble by not understanding the data type rules for constants or the promotion rules for operands of most operators, but consider that often what one wants with `~` really is "all the bits" and not a specific number of bits. — John Bollinger, May 18 '23 at 18:17

score 3 · Answer 2 · answered May 18 '23 at 18:13

DDRB &= ~_BV(PB1);

With DDRB being a volatile unsigned char * (i.e. 8 bits wide) …

The question suggests the asker has some belief that the ~ operation is affected by DDRB. It is not.

In C, expressions form a tree, in which the operands of each operation may themselves by subexpressions with further operands, which may be further subexpressions. Then the semantics of each subexpression are determined by the types of the operands in that subexpression. The context of the surrounding operators and operands is irrelevant (with a few exceptions, such as that expressions in sizeof applied to an object are not evaluated).

For ~, the C 2018 standard says in clause 6.5.3.3, paragraph 4:

… The integer promotions are performed on the operand, and the result has the promoted type…

This means the integer promotions are applied to the operand, _BV(PB1).

The integer promotions are specified in 6.3.1.1 2. For an object with rank less than or equal to int or unsigned int, it says:

… If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int…

Thus, in a typical C implementation, if _BV(PB1) is a char or unsigned char, it is converted to int. If it is wider than an int and an unsigned int, it is left unchanged.

Then the ~ inverts all bits in the resulting value, whatever type it is. DDRB is irrelevant to that.

Then the &= must be performed. A &= B behaves largely like A = A & B except the lvalue A is determined only once. For &, 6.5.10 3 says the usual arithmetic conversions are performed.

The usual arithmetic conversions are specified in 6.3.1.8 1. The complete details are a bit complicated, involving considerations of floating-point, integer types, rank, and signedness. However, given that DDRB is an unsigned char, we can say the usual arithmetic conversions will convert it to the type resulting from the prior ~ operation. This will effectively extended the DDRB value with 0 bits, which will then be ANDed with the result of ~.

Finally, the assignment is performed. Assignment converts the value being assigned to the type of the destination operand. In this particular situation, that will not change the value, because the AND operation has already limited to value to what is representable in an unsigned char. In general, assignment to unsigned char would convert the value to unsigned char by wrapping modulo 2^N, where N is the number of bits in an unsigned char. This is equivalent to removing the high bits, leaving only the low N bits.

Thanks! This is an _really_ elaborate answer I needed when I first got taught C!! — Marco, May 18 '23 at 18:15

score 1 · Answer 3 · answered May 18 '23 at 21:40

If there is any doubt, I normally tell the compiler very explicitly what I want by type-casting. E.g. (a minimal example):

#include <stdint.h>
...
uint16_t inv_a = ~((uint16_t)a);

This way, the reader of the code (possibly myself at a later time) does not need to remember the finer details of the C Standard to be sure what is going on and I do not need to spend time trouble-shooting faults due to wrong assumptions about implicit promotions.

How does the compiler determine how many bits to invert when using the ~ operator?

3 Answers3