8

I understand that integer underflow and overflow are undefined.

However, given that C++ eventually compiles to assembly, isnt the behavior actually defined?

The bitwise representation stays the same, the integer format remains the same 0111..11 will always roll over to 1000..00, same for underflows, so why is it not considered defined behaviour?

About the assembly compilation, I was deriving from the rudimentary assembly we were taught in school, but code blocks gives

int x = INT_MAX;
int y = x+1;

compiles to

00401326    movl   $0x7fffffff,0x8(%esp)
0040132E    mov    0x8(%esp),%eax
00401332    inc    %eax
00401333    mov    %eax,0xc(%esp)

Now, regardless of the value of x, wont there always be an inc or a add instruction? So, where does the undefined behaviour arise?

Micha Wiedenmann
  • 19,979
  • 21
  • 92
  • 137
user87166
  • 193
  • 7
  • The question is interesting, but I think you should add C++ coding examples of both cases, and the dis-assembly that your compiler generates for each one of these cases. – barak manos Nov 16 '14 at 12:08
  • I have to agree with @barakmanos – MZaragoza Nov 16 '14 at 12:13
  • It is undefined in C++ because the various CPU in the world do not agree on a definition. For example, some CPU use "saturated" math where overflow results in the max value. – brian beuning Nov 16 '14 at 12:37
  • It's defined for the specific hardware it executes on, but not by the C++ standard. – user207421 Nov 17 '14 at 03:39
  • If compiler can assume, that something is undefined behavior it is for some reason. In most of them it's performance. The more assumptions compiler can take, the more optimized code it can produce. And still you can do "standard" operations on unsigned ints. – DawidPi Oct 11 '16 at 08:16

3 Answers3

7

However, given that C++ eventually compiles to assembly, isnt the behavior actually defined?

No, since the compiler decides what kind of assembly it emits. If the compiler wishes, it can generate assembly that erases your hard disk if it encounters undefined behavior.

(Actually, it may not even be true that "C++ eventually compiles to assembly". There exist C++ interpreters, for example - the Standard doesn't specify how/into what format C++ should compile.

One of the reasons why the creators of the Standard decided to leave it undefined is – as almost always – the opportunity for optimizations. If signed overflow is UB, then the compiler can, for instance, assume that x + 1 > x is always true and generate simpler/shorter/faster code that relies on this precondition.

5

Overflow of signed integers is undefined in the C++ Standard, precisely because different compilers, assemblers and platforms might interpret those differently.

You can reason about behaviour of a program when you know a platform it's going to run on, but without that knowledge it's impossible to predict how it will behave.

The bitwise representation stays the same, the integer format remains the same

That doesn't have to be true at all.

Bartek Banachewicz
  • 38,596
  • 7
  • 91
  • 135
  • Thanks, can you share a situation in which case the bitwise representation will not lead to intmax + 1 = intmin, and intmin-1=intmax? – user87166 Nov 16 '14 at 12:33
  • 2
    @user87166 when the compiler optimises it out because it's always UB. –  Nov 16 '14 at 12:35
  • 4
    @user87166 http://en.wikipedia.org/wiki/Signed_number_representations#Comparison_table – KoKuToru Nov 16 '14 at 12:36
  • if the reason is platform dependency, why is unsigned overflow well defined? – Karoly Horvath Nov 16 '14 at 12:52
  • @KarolyHorvath Probably because it raises less concerns with compatibility - I can't really imagine a platform using something different than all zeros for 0 and all ones for maximum. Storing the signature bit can be more arbitrary, though. – Bartek Banachewicz Nov 16 '14 at 12:54
  • In reality almost everywhere uses two's complement and unfortunately you see a lot of code relying on `-1 == UINT_MAX`. – sjdowling Nov 16 '14 at 13:36
  • @sjdowling That is because it is guaranteed (http://stackoverflow.com/questions/22801069/using-1-as-a-flag-value-for-unsigned-size-t-types/22801135#22801135) `(unsigned int) -1 == std::numeric_limits::max()` Or do you mean something else? – Tim Seguine Nov 16 '14 at 20:04
0

IIRC, the reason this is undefined is because C++ doesn't mandate how numbers need to be stored by the target machine.

Let's assume 8 bit per byte/char. This would give us:

  • std::numeric_limits<char>::max()
    • 2's complement: 127 (0b01111111)
    • 1's complement: 127 (0b01111111)
    • Signed magnitude: 127 (0b01111111)
  • std::numeric_limits<char>::min()
    • 2's complement: -128 (0b10000000)
    • 1's complement: -127 (0b10000000)
    • Signed magnitude: -127 (0b11111111)

You can already see for the minimum values that we have different bit patterns and minimum values while the maximum values are the same.

So, what should happen if you add 1 to the maximum? Let's assume we cast to unsigned, add 1, cast back to signed. The result would be:

  • 2's complement: -128 (0b10000000)
  • 1's complement: -127 (0b10000000)
  • Signed magnitude: -0 (0b10000000)

Quite a mess. But if we want to make the overflow well-defined, what can we do? Let's assume we have a signed char c = 127; and want to add 1. We could define that the result should always be -127 since that's what all three cited systems can represent (ignoring that these aren't the only systems to represent signed integers). But that would mean that compilers have to specifically catch that overflow and handle it correctly on 2's complement (the majority of systems) and signed magnitude systems which would mean extra instructions and thus less-than-ideal performance on those machines.

You are very unlikely to encounter a machine that is not using 2's complement in real-life, so couldn't the C++ people simply mandate it? I haven't found any current CPU or DSP that use anything other than 2's complement, but back when C++ was created there were machines using 1's complement (for example CDC Cyber) and I wouldn't be surprised to hear that some DSP still do today (after all, there are DSP that have char sizes other than 8 bit). And that's why it stays undefined behaviour.

Community
  • 1
  • 1
DarkDust
  • 90,870
  • 19
  • 190
  • 224