Bitwise operation results in unexpected variable size

Question

Context

We are porting C code that was originally compiled using an 8-bit C compiler for the PIC microcontroller. A common idiom that was used in order to prevent unsigned global variables (for example, error counters) from rolling over back to zero is the following:

if(~counter) counter++;

The bitwise operator here inverts all the bits and the statement is only true if counter is less than the maximum value. Importantly, this works regardless of the variable size.

Problem

We are now targeting a 32-bit ARM processor using GCC. We've noticed that the same code produces different results. So far as we can tell, it looks like the bitwise complement operation returns a value that is a different size than we would expect. To reproduce this, we compile, in GCC:

uint8_t i = 0;
int sz;

sz = sizeof(i);
printf("Size of variable: %d\n", sz); // Size of variable: 1

sz = sizeof(~i);
printf("Size of result: %d\n", sz); // Size of result: 4

In the first line of output, we get what we would expect: i is 1 byte. However, the bitwise complement of i is actually four bytes which causes a problem because comparisons with this now will not give the expected results. For example, if doing (where i is a properly-initialized uint8_t):

if(~i) i++;

we will see i "wrap around" from 0xFF back to 0x00. This behaviour is different in GCC compared with when it used to work as we intended in the previous compiler and 8-bit PIC microcontroller.

We are aware that we can resolve this by casting like so:

if((uint8_t)~i) i++;

or, by

if(i < 0xFF) i++;

however in both of these workarounds, the size of the variable must be known and is error-prone for the software developer. These kinds of upper bounds checks occur throughout the codebase. There are multiple sizes of variables (eg., uint16_t and unsigned char etc.) and changing these in an otherwise working codebase is not something we're looking forward to.

Question

Is our understanding of the problem correct, and are there options available to resolving this that do not require re-visiting each case where we've used this idiom? Is our assumption correct, that an operation like bitwise complement should return a result that is the same size as the operand? It seems like this would break, depending on processor architectures. I feel like I'm taking crazy pills and that C should be a bit more portable than this. Again, our understanding of this could be wrong.

On the surface this might not seem like a huge issue but this previously-working idiom is used in hundreds of locations and we're eager to understand this before proceeding with expensive changes.

Note: There is a seemingly similar but not exact duplicate question here: Bitwise operation on char gives 32 bit result

I didn't see the actual crux of the issue discussed there, namely, the result size of a bitwise complement being different than what's passed into the operator.

"Is our assumption correct, that an operation like bitwise complement should return a result that is the same size as the operand?" No, this is not correct, integer promotions apply. — Thomas Jager, Apr 15 '20 at 15:13
If you don't mind using GCC extensions, you could use `#define UMAX(x) ((typeof(x))-(typeof(x))1)` and `if (i < UMAX(i)) i++;`, which should work for all unsigned integer types, but uses GCC's `typeof` extension. Or `#define NOTUMAX(x) ((x) != (typeof(x))-(typeof(x))1)` and `if (NOTUMAX(i)) i++;`. — Ian Abbott, Apr 15 '20 at 15:33
duplicates: [Why does sizeof(char + char) return 4?](https://stackoverflow.com/q/35300925/995714), [What happens here? sizeof(short_int_variable + char_variable)](https://stackoverflow.com/q/16416227/995714), [Are chars automatically promoted in C expressions?](https://stackoverflow.com/q/32383507/995714) — phuclv, Apr 16 '20 at 01:49
Does this answer your question? [Is char default-promoted?](https://stackoverflow.com/questions/11985774/is-char-default-promoted) — phuclv, Apr 16 '20 at 01:49
While certainly relevant, I'm not convinced those are duplicates of this particular question, because they don't provide a solution to the problem. — Cody Gray - on strike, Apr 16 '20 at 04:27
*I feel like I'm taking crazy pills and that C should be a bit more portable than this.* If you didn't get integer promotions on 8-bit types, then your compiler wasn't C standard compatible. In that case I think you should go through *all* computations to check them and fix if necessary. — user694733, Apr 16 '20 at 07:29
Am I the only one wondering what logic, apart from really unimportant counters, can take it to "increment if there is space enough, else forget it"? If you are porting code, can you use int (4 bytes) instead of uint_8? That would prevent your problem in many cases. — puck, Apr 16 '20 at 10:30
@puck You're right, we could change it to 4 bytes, but it would break compatibility when communicating with existing systems. The intent is to know when there are *any* errors and so a 1-byte counter was originally sufficient, and remains so. — Charlie Salts, Apr 16 '20 at 21:04
@user694733 The previous hardware was an 8-bit CPU, so that may affect how integer promotion was happening, if at all. Regarding checking the code: yes that's what we are now in the process of doing. — Charlie Salts, Apr 16 '20 at 21:16

dbush · Accepted Answer · 2020-04-15T18:47:37.340

30

What you are seeing is the result of integer promotions. In most cases where an integer value is used in an expression, if the type of the value is smaller than int the value is promoted to int. This is documented in section 6.3.1.1p2 of the C standard:

The following may be used in an expression wherever an intor unsigned int may be used

An object or expression with an integer type (other than intor unsigned int) whose integer conversion rank is less than or equal to the rank of int and unsigned int.

A bit-field of type _Bool, int ,signed int, orunsigned int`.

If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.

So if a variable has type uint8_t and the value 255, using any operator other than a cast or assignment on it will first convert it to type int with the value 255 before performing the operation. This is why sizeof(~i) gives you 4 instead of 1.

Section 6.5.3.3 describes that integer promotions apply to the ~ operator:

The result of the ~ operator is the bitwise complement of its (promoted) operand (that is, each bit in the result is set if and only if the corresponding bit in the converted operand is not set). The integer promotions are performed on the operand, and the result has the promoted type. If the promoted type is an unsigned type, the expression ~E is equivalent to the maximum value representable in that type minus E.

So assuming a 32 bit int, if counter has the 8 bit value 0xff it is converted to the 32 bit value 0x000000ff, and applying ~ to it gives you 0xffffff00.

Probably the simplest way to handle this is without having to know the type is to check if the value is 0 after incrementing, and if so decrement it.

if (!++counter) counter--;

The wraparound of unsigned integers works in both directions, so decrementing a value of 0 gives you the largest positive value.

edited Apr 15 '20 at 18:47

answered Apr 15 '20 at 15:19

dbush

205,898
23
218
273

1

`if (!++counter) --counter;` might be less weird to some programmers than using the comma operator. – Eric Postpischil Apr 15 '20 at 17:21
1

Another alternative is `++counter; counter -= !counter;`. – Eric Postpischil Apr 15 '20 at 18:43
@EricPostpischil Actually, I like your first option better. Edited. – dbush Apr 15 '20 at 18:49
15

This is ugly and unreadable no matter how you write it. If you have to use an idiom like this, **do every maintenance programmer a favor and wrap it up as an inline function**: something like `increment_unsigned_without_wraparound` or `increment_with_saturation`. Personally, I'd use a generic tri-operand `clamp` function. – Cody Gray - on strike Apr 16 '20 at 03:42
@CodyGray: I don't see a good clamp option. It's not like you can do `clamp(some_unsigned_int + 1, 0, UINT_MAX)` and have that actually work. – user2357112 Apr 16 '20 at 06:39
5

Also, you can't make this a function, because it has to behave differently for different argument types. You'd have to use a [type-generic macro](http://port70.net/~nsz/c/c11/n1570.html#6.5.1.1). – user2357112 Apr 16 '20 at 06:46
@user694733: There is no signed overflow in that case, because 256 is within the representable range for an int. `++counter` is [equivalent](http://port70.net/~nsz/c/c11/n1570.html#6.5.3.1p2) to `(counter+=1)`, which computes `counter+1` (with no overflow) and assigns the result back to `counter` (with defined overflow semantics). No UB occurs. – user2357112 Apr 16 '20 at 10:23
@user2357112supportsMonica You are right. I made mistake analyzing the code. But maybe that is a indication that this may not be the correct way to fix the issue. It's not intuitive at all. – user694733 Apr 16 '20 at 10:31
@user2357112supportsMonica: That's assuming that CHAR_MAX < INT_MAX, which I believe the C standard doesn't actually guarantee. On some systems they could be equal, in which case `char c = CHAR_MAX; c += 1` can be UB. (Of course, the OP is probably not compiling for such a system, since they're implicitly assuming 8-bit chars.) – Ilmari Karonen Apr 16 '20 at 13:16

bruno · Answer 2 · 2020-04-15T17:45:11.347

7

in sizeof(i); you request the size of the variable i, so 1

in sizeof(~i); you request the size of the type of the expression, which is an int, in your case 4

To use

if(~i)

to know if i does not value 255 (in your case with an the uint8_t) is not very readable, just do

if (i != 255)

and you will have a portable and readable code

There are multiple sizes of variables (eg., uint16_t and unsigned char etc.)

To manage any size of unsigned :

if (i != (((uintmax_t) 2 << (sizeof(i)*CHAR_BIT-1)) - 1))

The expression is constant, so computed at compile time.

#include <limits.h> for CHAR_BIT and #include <stdint.h> for uintmax_t

edited Apr 15 '20 at 17:45

answered Apr 15 '20 at 15:07

bruno

32,421
7
25
37

3

The question explicitly states they have multiple sizes to deal with, so `!= 255` is inadequate. – Eric Postpischil Apr 15 '20 at 17:17
@EricPostpischil ah yes, I forget that, so "if (i != ((1u << sizeof(i)*8) - 1))" supposing always unsigned ? – bruno Apr 15 '20 at 17:23
1

That will be undefined for `unsigned` objects since shifts of the full object width are not defined by the C standard, but it can be fixed with `(2u << sizeof(i)*CHAR_BIT-1) - 1`. – Eric Postpischil Apr 15 '20 at 17:26
oh yes ofc, CHAR_BIT, my bad – bruno Apr 15 '20 at 17:26
2

For safety with wider types, one might use `((uintmax_t) 2 << sizeof(i)*CHAR_BIT-1) - 1`. – Eric Postpischil Apr 15 '20 at 17:28

Eric Postpischil · Answer 3 · 2020-04-16T16:03:19.517

Here are several options for implementing “Add 1 to x but clamp at the maximum representable value,” given that x is some unsigned integer type:

Add one if and only if x is less than the maximum value representable in its type:
```
x += x < Maximum(x);
```
See the following item for the definition of Maximum. This method stands a good chance of being optimized by a compiler to efficient instructions such as a compare, some form of conditional set or move, and an add.
Compare to the largest value of the type:
```
if (x < ((uintmax_t) 2u << sizeof x * CHAR_BIT - 1) - 1) ++x
```
(This calculates 2^N, where N is the number of bits in x, by shifting 2 by N−1 bits. We do this instead of shifting 1 N bits because a shift by the number of bits in a type is not defined by the C standard. The CHAR_BIT macro may be unfamiliar to some; it is the number of bits in a byte, so sizeof x * CHAR_BIT is the number of bits in the type of x.)

This can be wrapped in a macro as desired for aesthetics and clarity:
```
#define Maximum(x) (((uintmax_t) 2u << sizeof (x) * CHAR_BIT - 1) - 1)
if (x < Maximum(x)) ++x;
```

Increment x and correct if it wraps to zero, using an if:

if (!++x) --x; // !++x is true if ++x wraps to zero.

Increment x and correct if it wraps to zero, using an expression:
```
++x; x -= !x;
```
This is is nominally branchless (sometimes beneficial for performance), but a compiler may implement it the same as above, using a branch if needed but possibly with unconditional instructions if the target architecture has suitable instructions.
A branchless option, using the above macro, is:
```
x += 1 - x/Maximum(x);
```
If x is the maximum of its type, this evaluates to x += 1-1. Otherwise, it is x += 1-0. However, division is somewhat slow on many architectures. A compiler may optimize this to instructions without division, depending on the compiler and the target architecture.

I just can't bring myself to upvote an answer that recommends using a macro. C has inline functions. You're not doing anything inside that macro definition that cannot be easily done inside of an inline function. And if you're going to use a macro, make sure you strategically parenthesize for clarity: operator << has very low precedence. Clang warns about this with `-Wshift-op-parentheses`. The good news is, an optimizing compiler is *not* going to generate a division here, so you don't have to worry about it being slow. — Cody Gray - on strike, Apr 16 '20 at 03:44
@CodyGray, if you think you can do this with a function, write an answer. — Carsten S, Apr 16 '20 at 09:45
@CodyGray: `sizeof x` cannot be implemented inside a C function because the `x` would have to be a parameter (or other expression) with some fixed type. It could not produce the size of whatever argument type the caller uses. A macro can. — Eric Postpischil, Apr 16 '20 at 11:00

score 2 · Answer 4 · edited Dec 28 '21 at 12:22

Before stdint.h the variable sizes can vary from compiler to compiler and the actual variable types in C are still int, long, etc and are still defined by the compiler author as to their size. Not some standard nor target specific assumptions. The author(s) then need to create stdint.h to map the two worlds, that is the purpose of stdint.h to map the uint_this that to int, long, short.

If you are porting code from another compiler and it uses char, short, int, long then you have to go through each type and do the port yourself, there is no way around it. And either you end up with the right size for the variable, the declaration changes but the code as written works....

if(~counter) counter++;

or...supply the mask or typecast directly

if((~counter)&0xFF) counter++;
if((uint_8)(~counter)) counter++;

At the end of the day if you want this code to work you have to port it to the new platform. Your choice as to how. Yes, you have to spend the time hit each case and do it right, otherwise you are going to keep coming back to this code which is even more expensive.

If you isolate the variable types on the code before porting and what size the variable types are, then isolate the variables that do this (should be easy to grep) and change their declarations using stdint.h definitions which hopefully won't change in the future, and you would be surprised but the wrong headers are used sometimes so even put checks in so you can sleep better at night

if(sizeof(uint_8)!=1) return(FAIL);

And while that style of coding works (if(~counter) counter++;), for portability desires now and in the future it is best to use a mask to specifically limit the size (and not rely on the declaration), do this when the code is written in the first place or just finish the port and then you won't have to re-port it again some other day. Or to make the code more readable then do the if x<0xFF then or x!=0xFF or something like that then the compiler can optimize it into the same code it would for any of these solutions, just makes it more readable and less risky...

Depends on how important the product is or how many times you want send out patches/updates or roll a truck or walk to the lab to fix the thing as to whether you try to find a quick solution or just touch the affected lines of code. if it is only a hundred or few that is not that big of a port.

score 0 · Answer 5 · edited Jun 20 '20 at 09:12

6.5.3.3 Unary arithmetic operators
...
4 The result of the ~ operator is the bitwise complement of its (promoted) operand (that is, each bit in the result is set if and only if the corresponding bit in the converted operand is not set). The integer promotions are performed on the operand, and the result has the promoted type. If the promoted type is an unsigned type, the expression ~E is equivalent to the maximum value representable in that type minus E.

^{C 2011 Online Draft}

The issue is that the operand of ~ is being promoted to int before the operator is applied.

Unfortunately, I don't think there's an easy way out of this. Writing

if ( counter + 1 ) counter++;

won't help because promotions apply there as well. The only thing I can suggest is creating some symbolic constants for the maximum value you want that object to represent and testing against that:

#define MAX_COUNTER 255
...
if ( counter < MAX_COUNTER-1 ) counter++;

I appreciate the point about integer promotion - it looks like this is the issue we're running into. Worth pointing out, however, is that in your second code sample, the `-1` isn't needed, as this would cause the counter to settle at 254 (0xFE). In any case, this approach, as mentioned in my question, isn't ideal due to different variable sizes in the codebase that participate in this idiom. — Charlie Salts, Apr 15 '20 at 18:43

Bitwise operation results in unexpected variable size

5 Answers5