12

It's gotten a lot of attention lately that signed integer overflow is officially undefined in C and C++. However, a given implementation may choose to define it; in C++, an implementation may set std::numeric_limits<signed T>::is_modulo to true to indicate that signed integer overflow is well-defined for that type, and wraps like unsigned integers do.

Visual C++ sets std::numeric_limits<signed int>::is_modulo to true. This has hardly been a reliable indicator, since GCC set this to true for years and has undefined signed overflow. I have never encountered a case in which Visual C++'s optimizer has done anything but give wraparound behavior to signed integers - until earlier this week.

I found a case in which the optimizer emitted x86-64 assembly code that acted improperly if the value of exactly INT_MAX was passed to a particular function. I can't tell whether it's a bug, because Visual C++ doesn't seem to state whether signed integer overflow is considered defined. So I'm wondering, is it supposed to be defined in Visual C++?

EDIT: I found this when reading about a nasty bug in Visual C++ 2013 Update 2 that wasn't in Update 1, where the following loop generates bad machine code if optimizations are enabled:

void func (int *b, int n)
{
  for (int i = 0; i < n; i++)
    b[i * (n + 1)] = 1;
}

That Update 2 bug results in the repeated line having its code generated as if it were b[i] = 1;, which is clearly wrong. It turned into rep stosd.

What was really interesting was that there was weirdness in the previous version, Update 1. It generated code that didn't properly handle the case that n exactly equaled INT_MAX. Specifically, if n were INT_MAX, the multiplication would act as if n were long long instead of int - in other words, the addition n + 1 would not cause the result to become INT_MIN as it should.

This was the assembly code in Update 1:

    movsxd  rax, edx          ; RDX = 0x000000007FFFFFFF; RAX = 0x000000007FFFFFFF.
    test    edx, edx
    jle     short locret_76   ; Branch not taken, because EDX is nonnegative.
    lea     rdx, ds:4[rax*4]  ; RDX = RAX * 4 + 4; RDX becomes 0x0000000200000000.
    nop                       ; But it's wrong. RDX should now be 0xFFFFFFFE00000000.
loc_68:
    mov     dword ptr [rcx], 1
    add     rcx, rdx
    dec     rax
    jnz     short loc_68
locret_76:
    retn

The issue is that I don't know whether this is a compiler bug - in GCC and Clang, this wouldn't be a compiler bug, because those compilers consider signed integer overflow/underflow to be undefined. Whether this is a bug in Visual C++ depends on whether Visual C++ considers signed integer overflow/underflow to be undefined.

Every other case I've seen besides this one has shown Visual C++ to consider signed overflow/underflow to be defined, hence the mystery.

Myria
  • 3,372
  • 1
  • 24
  • 42
  • I found this: http://msdn.microsoft.com/en-us/library/dd570023.aspx – leewz Jun 29 '14 at 05:31
  • 1
    Related thread: http://stackoverflow.com/questions/13272959/is-numeric-limitsintis-modulo-logically-contradictory . It seems to me that it is still unclear in C++11 whether `is_modulo == true` is supposed to imply that the type has well-defined behaviour on overflow. – M.M Jun 29 '14 at 05:51
  • There's nothing special about `INT_MAX`. However, with wrapping `INT_MIN` has the property that `-INT_MIN` = `INT_MIN`. And it's just a couple of operations (or just one operation, namely +1) to go from `INT_MAX` to `INT_MIN`. – Cheers and hth. - Alf Jun 29 '14 at 05:51
  • post a **complete but minimal example** – Cheers and hth. - Alf Jun 29 '14 at 05:53
  • What is the prototype of this function? If it takes an `int`, then calling it with `INT_MAX` does not yield any overflow (and the bug stems from something else). If it takes a `short` (for example), then calling it with `INT_MAX` would obviously yield an overflow. – barak manos Jun 29 '14 at 06:24
  • @barakmanos Added long edit explaining more and showing the code from which it originates. The overflow comes from adding 1 to INT_MAX inside the function. – Myria Jun 29 '14 at 07:00
  • "GCC set this to true". It doesn't. – n. m. could be an AI Jun 29 '14 at 07:01
  • @Cheersandhth.-Alf My reply to barak also answers your questions. I wish Stack Overflow could allow multi-reply, but alas, spam. – Myria Jun 29 '14 at 07:01
  • 1
    @n.m. Note that I used the past tense and said "for years" =) It wasn't corrected until April 29, 2012. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22200#c42 – Myria Jun 29 '14 at 07:03
  • If `std::numeric_limits::is_modulo` is `true`, integer overflow must be defined, both according to the standard and to the principle of the least surprise. I'm not sure how one can treat a violation of this as anything other than a bug. – n. m. could be an AI Jun 29 '14 at 07:17
  • 1
    @n.m. I agree - it's either a bug in the compiler's handling of signed integers, or a bug in their numeric_limits header. The question of which one is wrong is entirely intent. In related news, I reported the Clang equivalent today, where the answer is obviously that the STL is the side that's wrong: http://llvm.org/bugs/show_bug.cgi?id=20158 – Myria Jun 29 '14 at 07:23
  • BTW the wording in C++03 was itself a bug, because it had no reasonable meaning at all. – n. m. could be an AI Jun 29 '14 at 07:31
  • Signed integer overflow is undefined behavior per the C/C++ committee. There's nothing a program can do to make it well defined. Compiler writers are free to do what they want when they encounter it. This makes no sense: *"... a given implementation may choose to define it"*. And it makes no sense to compare C and assembly language. Assembly language does not have the notion of "signed integer overflow ⇒ undefined behavior". Finally, Intel's compilers (ICC/ICPC) are ruthless about removing UB. ICC/ICPC breaks programs that appear correct under Clang/Comeaux/GCC/MSVC. – jww Jul 26 '14 at 10:37
  • 3
    @jww A given C++ implementation (a compiler) is free to define any undefined behavior by the C++ standard however it wants... including, say, that signed integer overflow obeys naive 2s complement modulo arithmetic. – Yakk - Adam Nevraumont Jul 26 '14 at 14:23
  • @Yakk - I disagree, but I might be nitcking. A compiler cannot make an illegal program with undefined behavior anything other than what it is. The program will still be illegal and it will still have undefined behavior. The compiler may try to accommodate a user. MSVC is very accommodating, ICC/ICPC is not (ICC will simply remove it). He might be able to make his code above work with GCC using `-fwrapv`. Ian Lance Taylor has a good treatment on the subject at [Signed Overflow](http://web.archive.org/web/20120414040615/http://www.airs.com/blog/archives/120). – jww Jul 26 '14 at 14:43
  • 6
    @jww Programs with undefined behavior are not illegal. They simply have behavior not defined by the standard. Compiler writers are **free to define** what they will do, or leave it undefined, or summon demons to shoot out of the programmers nose. gcc with `-fwrapv` defines what happens on signed overflow, for example. – Yakk - Adam Nevraumont Jul 26 '14 at 14:50
  • @jww undefined behavior is not illegal, you are thinking of "ill formed" which should not compile. Undefined behavior is allowed in many cases because it is required to interact with hardware and thus the code is not portable and should not be expected to be well defined on another combination of compiler/hardware. – Mgetz Jul 28 '14 at 20:39

2 Answers2

3

Found an interesting tidbit from back 2016 (VS2015 Update 3):

They talk about the new SSA optimizer they want to introduce into VS2015:

C++ Team Blog - Introducing a new, advanced Visual C++ code optimizer

... ... ...

Historically, Visual C++ did not take advantage of the fact that the C and C++ standards consider the result of overflowing signed operations undefined. Other compilers are very aggressive in this regard, which motivated the decision to implement some patterns which take advantage of undefined integer overflow behavior. We implemented the ones we thought were safe and didn’t impose any unnecessary security risks in generated code.

So there you have it. I read that as: "we never programmed in any extra bits to make use of this UB", but starting from VS2015/Update3 we will have some.

I should note that even before that I'd be extremely wary, because for 64 bit code and 32bit variables, if the compiler/optimizer simply puts the 32bit signed int into a 64bit register, you'll have undefined no matter what. (As shown in "How not to code: Undefined behavior is closer than you think" - unfortunately, it's unclear from the blog post whether he used VS2015 pre or post Update3.)

So my take on this whole affair is that MSVC always considered it UB, even though past optimizer version did not take special advantage of the fact. The new SAA optimizer seems to do for sure. (would be interesting to test if the –d2UndefIntOverflow– switch does it's job.)

Martin Ba
  • 37,187
  • 33
  • 183
  • 337
1

Your example probably does have undefined behavior for n == INT_MAX, but not just because of signed integer overflow being undefined (which it may not be on the Microsoft compiler). Rather, you are probably invoking undefined out-of-bounds pointer arithmetic.

Demi
  • 3,535
  • 5
  • 29
  • 45
  • size_t is 64 bits on Windows x86-64. It is legal to allocate 16 GB of contiguous memory if your system has it. If you do `new int[UINT64_C(0x100000000)]`, and have memory to satisfy the request, you can add `INT_MIN` to this pointer to get a valid pointer. At least one write to `b[1 * INT_MIN]` ought to succeed before crashing. If you have a massive chunk of reserved uncommitted memory and a `__try` block, it's even fully-defined in Win64 that a exception that can be handled will occur at `b[2 * INT_MIN]`. In other words, without this overflow issue, it's defined in Win64. – Myria Aug 07 '14 at 03:44