15

Is left-shifting a negative int Undefined Behavior in C++11?

The relevant Standard passages here are from 5.8:

2/The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

The part that confuses me is:

Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

Should this be interpreted to mean that left-shifting any negative number is UB? Or does it only mean if you LS a negative and the result doesn't fit in the result type, then it's UB?

Moreover, the preceding clause says:

1/The shift operators << and >> group left-to-right. shift-expression: additive-expression shift-expression << additive-expression shift-expression >> additive-expression

The operands shall be of integral or unscoped enumeration type and integral promotions are performed.

The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.

This makes it explicit that using a negative number for one of the operands is UB. If it were UB to use a negative for the other operand, I would expect that to be made clear here as well.

So, bottom line, is:

-1 << 1

Undefined Behavior?


@Angew provided a psudocode interpretation of the Standardese which succinctly expresses one possible (likely) valid interpretation. Others have questioned whether this question is really about the applicability of the language "behavior is undefined" versus our (StackOverflow's) use of the phrase "Undefined Behavior." This edit is to provide some more clarification on what I'm trying to ask.

@Angew's interpretation of the Standardese is:

if (typeof(E1) == unsigned integral)
  value = E1 * 2^E2 % blah blah;
else if (typeof(E1) == signed integral && E1 >= 0 && representable(E1 * 2^E2))
  value = E1 * 2^E2;
else
  value = undefined;

What this question really boils down to is this -- is the correct interpretation actually:

value = E1 left-shift-by (E2)

switch (typeof(E1))
{
case unsigned integral :
  value = E1 * 2^E2 % blah blah;
  break;

case signed integral :
  if (E1 >= 0)
  { 
    if (representable(E1 * 2^E2))
    {
      value = E1 * 2^E2;
    }
    else
    {
      value = undefined;
    }
  }
  break;
}

?

Sidenote, in looking at this in terms of psudocode makes it fairly clear in my mind that @Agnew's interpretation is the correct one.

Community
  • 1
  • 1
John Dibling
  • 99,718
  • 31
  • 186
  • 324
  • I understand that the result value can be undefined because the representation of negative numbers is not explicitly set by the standard. But should this not mean that the result of the operation is undefined? I am not sure "behavior is undefined" equates exactly to the term "Undefined Behavior". I am more inclined to think the author meant the resulting value is undefined (but either way the language should be tightened). – Martin York Oct 25 '13 at 17:41
  • 1
    @LokiAstari There's already terminology in the standard for saying when the behavior is defined but no specific value is required; It will say the value is unspecified or indeterminate. I'd expect that language to be used if that were the intent. – bames53 Oct 25 '13 at 18:44
  • 11
    As long as we're language-lawyering… Post C++11, p2 has been modified by CWG 1457 like so: http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1457 . The impact of this tweak is that one can shift a 1 bit of a signed type into the sign bit, and post-C++11 it is now not undefined behavior. This was done so that one could (e.g.) `1 << 31` to get `0x80000000` for a 32 bit int without the gods of UB reigning down on your head. This was a relief for authors of `std::numeric_limits::min()` which is often computed this way, and is also `constexpr`, meaning UB is caught at compile time. – Howard Hinnant Oct 25 '13 at 21:12
  • @HowardHinnant FYI, I reference you comment in my answer to this [question](http://stackoverflow.com/questions/21319413/why-do-constant-expressions-have-an-exclusion-for-undefined-behavior). – Shafik Yaghmour Jan 23 '14 at 23:11
  • 2
    @HowardHinnant considering your comment, you may be able to add some details on the rationale to this question: [Why was 1 << 31 changed to be implementation-defined in C++14?](http://stackoverflow.com/q/26331035/1708801). – Shafik Yaghmour Oct 13 '14 at 02:18
  • I wonder if (-1<<1)>>1 is defined. – Kun Aug 25 '19 at 06:59
  • @Kun: How could it be if `(-1<<1)` is undefined? – John Dibling Sep 12 '19 at 23:35

3 Answers3

9

Yes, I would say it's undefined. If we translate the standardese to pseudo-code:

if (typeof(E1) == unsigned integral)
  value = E1 * 2^E2 % blah blah;
else if (typeof(E1) == signed integral && E1 >= 0 && representable(E1 * 2^E2))
  value = E1 * 2^E2;
else
  value = undefined;

I'd say the reason why they're explicit about the right-hand operand and not about the left-hand one is that the paragrpah you quote (the one with the right-hand operand case) applies to both left and right shifts.

For the left-hand operand, the ruling differs. Left-shifting a negative is undefined, right-shifting it is implementation-defined.

Angew is no longer proud of SO
  • 167,307
  • 17
  • 350
  • 455
  • +1 I see what you're saying. Any thoughts on why they are both just implementation defined? That is, why was it necesarry to shun left shifting negatives to the land of the lost toys? – John Dibling Oct 25 '13 at 15:55
  • @JohnDibling No idea, really. I'd say that's a question for a processor-architecture expert, and I know precious little about that. It's not the only asymetry in regards to bit shifts. Java has `>>>` but doesn't have `<<<`, for example. – Angew is no longer proud of SO Oct 25 '13 at 16:02
  • @Angew: `<<<` and `<<` are the same operation, so there's no point. But `>>>` and `>>` are two different operations. – Puppy Oct 25 '13 at 17:17
  • 2
    @DeadMG Well, you could technically also have "shift to the left, but leave the sign bit intact." – Angew is no longer proud of SO Oct 25 '13 at 17:24
  • Looking at your psudocode interpretation and the alternative I posited in the OP leaves little room in my mind for your interpretation to be incorrect. Unless someone has a compelling case counter to this, I'll accept this in a little while. – John Dibling Oct 25 '13 at 19:06
  • If in a language like Java you have no distinction between unsigned and signed integers, you cannot choose the conceptually correct rightsift based on type as you coulf in C++ and therefore need separate oerators `>>` , respectively `>>>`. Rightshift of signeds is implementatiion in C++, but you can check the behavior via (`static_assert`). Rightshifting (signed) integral values, which fills in with the MSB is _very_ handy for certain bit manipulation techniques. – koraxkorakos Dec 25 '13 at 23:05
  • @John Dibling: unfortunately, the C standard does not specify two's compliment storage for integers. So you can't know out of the gates that just leaving the "sign bit" intact will really do what you want it to. So it's up to the compiler for a given architecture to decided what the right behaviour should be for a left-shift operator. (The Java solution, providing an arithmetic left-shift as well as a logical left-shift, seems pretty awesome in comparison.) – Mike Apr 15 '14 at 17:27
  • 1
    @Mike: From what I read, the behavior was originally undefined because of the possibility that there might be an implementation somewhere that would trigger a trap in such a case. After decades without any such implementation ever having been found, it would have been changed to implementation-defined except that doing so would no longer allow compilers to regard as impossible any chain of events which would result in a left-shift of a negative number. – supercat Apr 24 '15 at 23:33
5

Should this be interpreted to mean that left-shifting any negative number is UB?

Yes, the behavior is undefined when given any negative number. The behavior is only defined when both of the following are true:

  • the number is non-negative
  • E1 × 2E2 is representable in the result type

That's literally what "if E1 has a signed type and non-negative value, and E1×2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined," is saying:

if X and Y
  then Z
else U
bames53
  • 86,085
  • 15
  • 179
  • 244
1

Answer as per the Question:

The question really is:

Can we equate the term "behavior is undefined" equates exactly to the term "Undefined Behavior".

As it is currently worded it means "Undefined Behavior."

Personal comment about the situation

But I am not convinced that is the authors intention.
If it is the authors intention, then we should probably have a note explaining why. But I am more inclined to believe the author meant that the result of that operation is undefined because the representation of negative numbers is not explicitly defined by the standard. If the representation of negative numbers is not explicitly defined for negatives, then moving the bits around would lead to an undefined value.

Either way, the wording (or explanation) needs to be tightened/expanded to make it less ambiguous.

Community
  • 1
  • 1
Martin York
  • 257,169
  • 86
  • 333
  • 562
  • +1 I can't imagine how much code would "break" if this was *actually* UB and compilers treated it as such. – Mysticial Oct 25 '13 at 17:49
  • 1
    @Mysticial: Remember UB means anything including the rational can happen. – Martin York Oct 25 '13 at 17:50
  • 3
    "the behaviour is undefined" is how the standard describes that something invokes UB. Just count the hits for "the behaviour is undefined", really. – Xeo Oct 25 '13 at 17:50
  • @Xeo: As I said I would err on the side of caution and agree. But I see no reason for it to be UB. I would think this is more on the grounds of Implementation Defined behavior. – Martin York Oct 25 '13 at 17:51
  • @Loki: No, the wording definitly means that something invokes UB. The real question is not if that is actually so, but which part of the previous condition the "otherwise" applies to. – Xeo Oct 25 '13 at 17:58
  • @Xeo Agreed, I am looking at `n3485` and the counts of `undefined behavior` Vs `behavior is undefined` and the tally is `62` to `73`. – Shafik Yaghmour Oct 25 '13 at 17:58
  • @Xeo: No I think it is clear that the above statement applies (it falls to the right of the otherwise). Yes I agree as it stands it also means UB. BUT I don't understand why it should, that is why I think it either needs an explanatory note or better wording. – Martin York Oct 25 '13 at 18:02
  • 2
    One valid reason for making it undefined rather than implementation-defined is that the compile-time evaluation of some shifts gives different results from the run-time evaluation of those same shifts. Making it implementation-defined would force implementations to choose, and choosing would mean the compiler's behaviour has to change, for the sole benefit of buggy programs. –  Oct 25 '13 at 18:14
  • @hvd could you add an answer with some more details? This thread started b/c John and I interpreted this differently in another question and I would love to understand it better. – Shafik Yaghmour Oct 25 '13 at 18:18
  • @ShafikYaghmour I don't think I have anything useful to say about the actual question, at least not without some reading first. What I could say that would be useful wouldn't be an answer to what's asked here. –  Oct 25 '13 at 18:28
  • 1
    +1, but this wasn't really what my question was about. It seems quite clear to me that when the Standard says "the behavior is undefined," they mean what we (StackOverflow) mean when we say "Undefined Behavior," either in the method of the calculations or in the resulting values. After all, if the calculation is undefined, so therefore must be the resulting value. By definition, everything that happens after an UB operation is itself an UB operation. – John Dibling Oct 25 '13 at 18:50
  • 1
    Consider this code: `int foo(int x) { int r = x << 20; if (x < -1000000) return -1; return 1; }` The compiler is free to assume that shifting left of negative numbers is UB, thus will never happen, thus the if statement can be ignored, and the entire routine becomes simply "return 1". A future version of clang is set to start making this optimization. Hilarity may ensue... – jorgbrown Apr 22 '15 at 16:42
  • @jorgbrown: That's perfectly valid for the compiler to do. I don't see your point as it applies to my answer. Its clearly undefined behavior. Based on the original version of the question I was postulating the question why it needs to be undefined behavior. – Martin York Apr 22 '15 at 16:49
  • 1
    @loki-astari Yeah, I know it's "valid", however when ANSI C 25 years ago decided to call it undefined, it was in order to acknowledge that on some architectures, arithmetic overflow trapped, and there was nothing you could do about it. The language spec was not intended to block access to efficient addition and shift instructions (except for those who didn't mind casting to unsigned and back). Anyway my comment wasn't directed at your answer, it was directed at some of the other comments. – jorgbrown Apr 24 '15 at 03:24
  • 1
    @jorgbrown: I wish some people with clout would split off a "sane C" standard which defined means by which programmers could say what behaviors they would acceptable in various situations, and compilers could either comply with requirements or refuse compilation. In situations where programmers would be willing to accept a partially-indeterminate value in case of overflow, but unbounded consequences would be unacceptable, a programmer could write code which read much cleaner and executed faster than code which was written to prevent overflow. – supercat Apr 24 '15 at 23:20
  • @jorgbrown: If someone is e.g. writing a picture-viewing utility and would be fine with having garbage data show up as arbitrary pixels, but doesn't want it to cause the laws of causality to be thrown out the window, requiring the person to replace `x=y+z;` with `x=(int)((unsigned)y+z);` is Just Plain Dumb, especially since the latter would constrain optimizers much more than would a sane interpretation of the former. I appreciate that some "friendly C" proposals would in most cases preclude useful optimization, but on the flip side, if the purpose of optimizers is to... – supercat Apr 24 '15 at 23:27
  • ...find the most efficient way of doing something **that meets programmer requirements**, forcing programmers to demand things they don't need (rigid wrapping-integer behavior) in order to get the things they do need (preservation of causality) is contrary to that objective. – supercat Apr 24 '15 at 23:29
  • @jorgbrown Regarding the code (`foo`) you showed; you mention a version of clang is "set to start making this optimization". Do you happen to have more details on this, e.g. which version? I'm trying to get any compiler to do above optimization to showcase danger of undefined behavior. – Daniel Jour Mar 12 '17 at 23:28
  • @jorgbrown: Minor nit: the behavior was defined in C89/C90, but in a fashion which may be illogical on non-two's-complement implementations. C99 made it undefined, but the rationale doesn't mention that as a change. The only logical interpretation I can see is that they wanted to let non-two's-complement implementations behave differently, and they didn't see any need to specify that two's-complement implementations keep behaving in C89 fashion since they thought everyone would recognize that it would be stupid for them to do otherwise. – supercat Oct 25 '17 at 16:11
  • @DanielJour The upcoming clang change was rejected as producing too little benefit for the compile-time it took. But there are still a few circumstances where something similar can kick in. Try this: "int32_t foo(int32_t x) { auto r = x << 32; if (x < 0) return r; return 1; }" Check godbolt for the gcc-vs-clang difference: https://godbolt.org/g/mMoNuH – jorgbrown Dec 14 '17 at 00:22