Is there a rule to spot out UB?

Question

I read this great too broad question, and encountered some UB I didn't know before.

The main reason for UB I see from time to time is changing a variable twice between two sequence points. Things like: x = x++ or z = y++ + ++y;. Reading that changing a variable twice between two sequence points is UB helped me see what was the underlying cause in these cases.

But what about things like bit-shift with negatives? (int x = 8 << -1) Is there a rule that can explain that or should I memorize this as a unique UB possibility?

I looked here and under section Integer Overflows I found bit-shift with negatives was written, but I don't understand why they are related. When int is shifted by too much, an overflow is caused, but IMO shifting by a negative is simply UB and the problem isn't the bits that are "over the edge"...

Also looked here ,but that didn't answer my question:

The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.

So my questions are:

Specifically, is bit-shift with negatives considered integer overflow and if so, why?
If not, is it a part of a bigger phenomena?
Are there (other) unique cases that can't be grouped under one underlying cause?

Negative shifts are undefined, as are over-long shifts (N or more bits shift for an N-bit integer type). The standard says so. You have to know that it says so. Yes, there are lots of cases, and grouping them is going to be tricky. Annex J.2 of the C11 standard documents undefined behaviours on pages 557-571 (there are only a few lines on each of the end pages, so it is a bit more than 14 pages). Read it periodically to understand what's undefined. No; I've not memorized it either. — Jonathan Leffler, Jul 18 '17 at 05:58
@JonathanLeffler that's an impressive list, thx! though I hoped for something easier to memorize :) — CIsForCookies, Jul 18 '17 at 06:05
To answer your part - *I hoped for something easier to memorize* my general rule of thumb is - anything that would seem to change behavior with different implementations (target, platform, etc) is a red flag to "spot" UB. Then I confirm with the list. It makes sense because the standard is defined on a abstract machine not any implementation. Hence the aspects which touch implementations have to be left undefined. Be wary of implementation defined behaviors. — Ajay Brahmakshatriya, Jul 18 '17 at 07:21
IMHO, negative shifting is just logically non sense because if you want to shift right, then there is an operator for that and it would be more tricky to remind that negative would inverse the shift than making it UB. But the main reason is probably that most *shift/rotate* processor instructions works with positive shiftings only, or at least that there is no agreement on this. — Jean-Baptiste Yunès, Jul 18 '17 at 07:24
@Jean-BaptisteYunès A more abstract version of your very convincing example would be "If it would probably be hard to make hardware suppliers agree on this, or would otherwise be hard to define clearly AND allow some room for optimised implementation, then it is probably UB". And that might be the "easy to memorize rule of thumb" OP is looking for. — Yunnosch, Jul 18 '17 at 07:27
@Jean-BaptisteYunès I don't agree because with that logic we should also make things like x - -1 UB since we should use +1 instead... I think that someone who looks at it (negative shift) for the first time will think of it as simple arithmetic — CIsForCookies, Jul 18 '17 at 07:52
@AjayBrahmakshatriya - but that is a very vague thing... I didn't see any reason for negative shift to be implementation dependent (as I said before, I look at it as x - -1)! Also, I think it's the other way around: if the behavior is defined -> not implementation dependent. Else -> implementation dependent. Isn't it so? — CIsForCookies, Jul 18 '17 at 07:56
@Jean-BaptisteYunès I think the real reason for it being UB is that it is not possible for the compiler to know whether the second operand to `<<` is positive or negative. It would have to generate conditions, which might be costly on certain platforms. — Ajay Brahmakshatriya, Jul 18 '17 at 08:14
@CIsForCookies see my comment above. Now the example of `x--1` is different because the hardware CAN perform subtraction on negative numbers. The compiler won't have to create checks. — Ajay Brahmakshatriya, Jul 18 '17 at 08:15
@AjayBrahmakshatriya correct me if I'm wrong, but won't that be implementation dependent too?(hw, but still...). As I see it, anyone can build their own HW with their own compiler and use C how they like as long the achieve the same results for the defined cases. In one of those machines the checks you described will be the same for subtraction and shift — CIsForCookies, Jul 18 '17 at 08:19
@CIsForCookies regarding your Also part, Implementation dependent is different from Undefined Behavior. In IDB the implementation has to state what the behavior will be and has to ensure that it is as stated(it is a binding on the implementation to define every point mentioned as IDB). But UB, it doesn't have to talk anything. It can even show different behavior every time a certain UB appears in the program. — Ajay Brahmakshatriya, Jul 18 '17 at 08:20
@CIsForCookies Yes, on some hardware where subtraction is not possible on negative numbers, the compiler would have to generate checks. But the standard is made keeping some main stream hardwares(for a lack of better plural) in mind. They try to keep some things "simple" for commonly used architectures. Others should just be able to run it, and they can in this case with conditions. — Ajay Brahmakshatriya, Jul 18 '17 at 08:22
@Yunnosch [Your comment](https://stackoverflow.com/questions/45158530/is-there-a-rule-to-spot-out-ub#comment77288604_45158530) is closest to the mark and not too long. C is a compromise (we need some level of agreement) within a compromise (lots of various platforms, then, now and future). — chux - Reinstate Monica, Jul 18 '17 at 15:31
Excellent summary of types of UB: https://blog.regehr.org/archives/1520 — davmac, Jul 18 '17 at 16:31

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

(Compiling an answer from comments, including mine.)

A good starting point for finding actual undefined behaviour (UB) are these references by Jonathan Leffler:

Yes, there are lots of cases, and grouping them is going to be tricky. Annex J.2 of the C11 standard documents undefined behaviours on pages 557-571 (there are only a few lines on each of the end pages, so it is a bit more than 14 pages).

Reference to a related article which dicsusses types of UB, tools for spotting and contains a list of UB; long and (by intention of the authors) complete (cortesy of davmac):
https://blog.regehr.org/archives/1520

Two approaches for something "memorizable":

by Ajay Brahmakshatriya, focusing on unavoidable platform-dependency:

my general rule of thumb is - anything that would seem to change behavior with different implementations (target, platform, etc) is a red flag to "spot" UB
by Yunnosch, focusing on problems to balance standardising and optimising:

If it would probably be hard to make hardware suppliers agree on this, or would otherwise be hard to define clearly AND allow some room for optimised implementation, then it is probably UB.

Sadly, all of these "rules" are not easy to apply. Checking the actual standard is inconvenient. The two rules of thumb are based on quite some required experience; you either need to have designed a few compilers and or processors, or have suffered a lot from differences between them.

So the actual answer to "Is there an easy way to spot UB?" is probably simply "No."

I would consider marking this as a community answer, since a large portion of this is not your words. If you do that, please remove the parenthetical comment at the top. — Patrick Roberts, Jul 18 '17 at 16:27

davmac · Accepted Answer · 2017-07-19T16:25:07.053

Specifically, is bit-shift with negatives considered integer overflow and if so, why?

It is not, because shifting 0 by any amount will never overflow but it is still undefined behaviour to shift a value of 0 by a negative value. (I am assuming that you could consider it integer overflow if you first re-interpret the shift amount as an unsigned integer, at which point it would be large and certainly beyond the allowed range, and an actual shift by that amount if interpreted as a multiplication-by-power-of-2 would certainly overflow if the shifted value was non-zero).

In short, a bit-shift by negative yields undefined behaviour because the language standard says that it does.

If not, is it a part of a bigger phenomena?

John Regehr gives some broad categories of UB in a blog post. Shift by invalid amounts is in the "other UB" category...

Are there (other) unique cases that can't be grouped under one underlying cause?

Yes, see the above post. Among others (these are directly lifted from the blog post):

Pointers that do not point into, or just beyond, the same array object are subtracted (6.5.6).
An object has its stored value accessed other than by an lvalue of an allowable type (6.5)
A nonempty source file does not end in a new-line character which is not immediately preceded by a backslash character or ends in a partial preprocessing token or comment (5.1.1.2)

You could possibly categorise these and the other examples in some way, but it's up to you how you'd want to do that.

In particular, the last example above (about the source file not ending in a new-line) shows just how arbitrary some of the rules are.

Now that you have an answer with that content, I offer to remove your nice link from my answer, in order not to take your credit. Do you want me to? — Yunnosch, Jul 18 '17 at 18:52

score 0 · Answer 3 · answered Jul 18 '17 at 19:40

In the case of x<<y with y negative, there are some platforms which will process something like z=x<<y with microcode equivalent to:

unsigned temp = x;
unsigned count=y;
while(count--)
  temp<<=1;
z=temp;

If y is negative, that loop might run a very long time; if it's handled at the microcode level (I think some Transputer chips were that way) it may disable interrupts for many minutes, which could disrupt other aspects of the system.

On most platforms it would cost nothing, outside of contrived scenarios, for a compiler to guarantee that x<<y would not have any side-effects for any values of x or y beyond yielding a possibly-meaningless value; it would in fact be easier for a compiler to generate code without side-effects than to do anything else. Unfortunately, some compiler writers think they should seek out "clever" ways of exploiting the fact that y "can't" be negative, triggering arbitrarily-bad consequences, without regard for whether it's actually useful, perhaps in the mistaken belief that "clever" and "stupid" are antonyms.

There are also platforms where only sufficient bits of `y` are connected to the shift unit to support the meaningful range of shift (i.e. as if `y` were masked down to the size of `x` in bits). — Toby Speight, Jul 19 '17 at 16:40

Is there a rule to spot out UB?

3 Answers3