8

The other day I upgraded my Windows build environment from MSVC2013 to MSVC2017, and lo and behold, a function in my program that had been working fine for years (and still works fine under g++/clang) suddenly started giving incorrect results when compiled with MSVC2017.

I was able to rewrite the function to give correct results again, but the experience made me curious -- was my function invoking undefined behavior (that just happened to give correct results until now), or was the code well-defined and MSVC2017 was being buggy?

Below is a trivial program showing a toy version of the function both before and after I rewrote it. In particular, does the function maybe_invokes_undefined_behavior(), as shown below, invoke undefined behavior, when called with an argument of value -32762?

#include <stdio.h>

enum {ciFirstToken = -32768};

// This function sometimes gives unexpected results under MSVC2017
void maybe_invokes_undefined_behavior(short token)
{
   if (token >= 0) return;

   token -= ciFirstToken;  // does this invoke undefined behavior if (token==-32762) and (ciFirstToken==-32768)?
   if (token == 6)
   {
      printf("Token is 6, as expected (unexpected behavior not reproduced)\n");
   }
   else
   {
      printf("Token should now be 6, but it's actually %i\n", (int) token);  // under MSVC2017 this prints -65530 !?
   }
}

// This function is rewritten to use int-math instead of short-math and always gives the expected result
void allgood(short token16)
{
   if (token16 >= 0) return;

   int token = token16;
   token -= ciFirstToken;
   if (token == 6)
   {
      printf("Token is 6, as expected (odd behavior not reproduced)\n");
   }
   else
   {
      printf("Token should now be 6, but it's actually %i\n", (int) token);  
   }
}

int main(int, char **)
{
   maybe_invokes_undefined_behavior(-32762);
   allgood(-32762);
   return 0;
}
Jeremy Friesner
  • 70,199
  • 15
  • 131
  • 234
  • 2
    Which version of VS2017 are you using? I can't reproduce the problem with v15.7.3 (x86 or x64, using /Od or /Ox). – 1201ProgramAlarm Jun 26 '18 at 01:30
  • I'm using v15.7.3, IIRC. – Jeremy Friesner Jun 26 '18 at 02:29
  • What is `INT_MAX` on this system? – M.M Jun 26 '18 at 03:40
  • Also can you show the exact unedited output of the program in the alleged buggy instance? – M.M Jun 26 '18 at 03:56
  • 2
    Seeing as people cannot reproduce the problem it would be helpful to post the exact compiler version and compilation switches to reproduce – M.M Jun 26 '18 at 04:02
  • 5
    They key problem of course is that `-(-32768)` in short math overflows with Undefined Behavior. The MSVC bug seems to be that it's transforming the valid code above to another form, and accidentally introduces that UB. – MSalters Jun 26 '18 at 08:00
  • The compiler prints its version on each invocation, should be something like 19.13.26129 (this version doesn't seem to reproduce the bug). – n. m. could be an AI Jun 26 '18 at 10:47
  • 2
    You really need to tell us the values of `SHORT_MIN` and `SHORT_MAX` on your platform for any determination of what's defined and what's not. Without that information, the question is close to meaningless. – Toby Speight Jun 26 '18 at 11:04
  • @TobySpeight: It's just MSVC++, so -32768/+32767 – MSalters Jun 26 '18 at 11:34
  • @MSalters, it should still be in the question, rather than assuming that everyone on SO knows (and can remember) the implementation-defined limits of every compiler/target combination. – Toby Speight Jun 26 '18 at 12:28
  • If `ciFirstToken` was a 16-bit _unsigned_ and `short` was 32-bit, output `-65530` would make sense. – chux - Reinstate Monica Jun 26 '18 at 15:12
  • If `int` was 16-bit (2's complement) , then `enum {ciFirstToken = -32768};` would be a problem (UB) fixable with `enum {ciFirstToken = -32767 - 1};`. Yet a 16-bit `int` is not likely with MSVC. I am wondering if some setting was forcing `enum` to 16-bit or 16-bit unsigned? – chux - Reinstate Monica Jun 26 '18 at 15:24
  • Did the program output the "unexpected value" message with a value of 6, or did it show the message along with some other value? I wonder if the compiler tried to get too clever and determine what value `token` would need to have to satisfy the `if`, and failed to account for wrapping in such computation? – supercat Jun 26 '18 at 19:00

1 Answers1

8

does this invoke undefined behavior if (token==-32762) and (ciFirstToken==-32768)?

token -= ciFirstToken;

NO (for the short answer)

Now let's break this down piece by piece.

1) As per expr.ass for compound assignment, -=:

The behavior of an expression of the form E1 op= E2 is equivalent to E1 = E1 op E2 except that E1 is evaluated only once.

the expression:

token -= ciFirstToken;

is equivalent to:

token = token - ciFirstToken;
//            ^ binary (not unary)

2) The additive operator (-) performs usual arithmetic conversion for operands of arithmetic type.

As per expr.arith.conv/1

Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions, which are defined as follows:

(1.5) Otherwise, the integral promotions shall be performed on both operands.

3) Both operands are then promoted to an int.

As per conv.prom/1:

A prvalue of an integer type other than bool, char16_­t, char32_­t, or wchar_­t whose integer conversion rank is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type;

4) After the integer promotion, no further conversion is required.

As per expr.arith.conv/1.5.1

If both operands have the same type, no further conversion is needed.

5) Finally, Undefined behavior for expressions is defined as per expr.pre:

If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined


CONCLUSION

So now substituting the values:

token = -32762 - (-32768);

After all the integer promotions, both operands fall within the valid range of INT_MIN[1] and INT_MAX[2].

And after evaluation, the mathematical result (6) is then implicitly converted to short, which is within the valid range of short.

Thus, expression is well-formed.

Many thanks to @MSalters, @n.m and @Arne Vogel for helping with this answer.


Visual Studio 2015 MSVC14 Integer Limits and MS Integer Limits defines:

[1] INT_MIN    -2147483648
[2] INT_MAX   +2147483647

SHRT_MIN    –32768
SHRT_MAX   +32767

Joseph D.
  • 11,804
  • 3
  • 34
  • 67
  • "Visual Studio 2015 MSVC14 Integer Limits" are nice, yet it is the 2017 one's that may explain the issue - although I doubt it. – chux - Reinstate Monica Jun 26 '18 at 15:15
  • @chux, this is the MS reference for [integer limits](https://learn.microsoft.com/en-us/cpp/cpp/integer-limits). Most likely nothing has changed since the standard hasn't too. – Joseph D. Jun 26 '18 at 15:17
  • @chux Microsoft are the kings of backwards compatibility - they're the ones who kept `long` as 32 bits instead of moving to 64 bits like everyone else. The chances that they'd change those limits is nil. – Mark Ransom Jun 27 '18 at 01:21
  • @MarkRansom: What advantage are they foregoing by interpreting `long` as "the smallest size that's at least 32 bits"? Is there any reason that code that actually *wants* 64 bits shouldn't use `int64_t` or `long long`? – supercat Jun 29 '18 at 14:31
  • @supercat I wasn't criticizing their choice at all, just using it as evidence of their priorities. – Mark Ransom Jul 01 '18 at 16:30
  • @MarkRansom: Fair point. I wonder what other priorities drive different behaviors? IMHO, quality implementations for 64-bit platforms should offer a choice of whether "long" should be treated as 32 or 64 bits, just as quality implementations for the 68000 allowed "int" to be 16 or 32 bits. The 16/32-bit int selection was complicated by the fact that the choice affects how "int" values get passed on the stack--an issue which would not be a problem on most 64-bit platforms which pass 32-bit and 64-bit values in compatible fashion. – supercat Jul 01 '18 at 18:36