20

Consider following program (C99):

#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>

int main(void)
{
    printf("Enter int in range %jd .. %jd:\n > ", INTMAX_MIN, INTMAX_MAX);
    intmax_t i;
    if (scanf("%jd", &i) == 1)
        printf("Result: |%jd| = %jd\n", i, imaxabs(i));
}

Now as I understand it, this contains easily triggerable undefined behaviour, like this:

Enter int in range -9223372036854775808 .. 9223372036854775807:
 > -9223372036854775808
Result: |-9223372036854775808| = -9223372036854775808

Questions:

  1. Is this really undefined behaviour, as in "code is allowed to trigger any code path, which any code that stroke compiler's fancy", when user enters the bad number? Or is it some other flavor of not-completely-defined?

  2. How would a pedantic programmer go about guarding against this, without making any assumptions not guaranteed by standard?

(There are a few related questions, but I didn't find one which answers question 2 above, so if you suggest duplicate, please make sure it answers that.)

hyde
  • 60,639
  • 21
  • 115
  • 176
  • Note that entering an int out of range causes undefined behaviour also. If you want to avoid UB you can't use any flavour of `%d` or other integer or floating point scanf specifiers. Use the `strto` family . And there's only one flavour of undefined behaviour , the bad one. – M.M Feb 07 '16 at 10:23
  • @M.M There is also implementation defined behavior, unspecified but valid value, and maybe some other milder alternatives to undefined behavior. But, do I misunderstand, or are you saying that scanf for signed or floating point number implicitly contains user-triggerable UB? Reference? – hyde Feb 07 '16 at 13:51
  • Yes, the user can trigger UB by entering a value out of range for the integer being scanned into. See the specification of `fscanf` in the C Standard. In C11 it is 7.21.6.2/10, " if the result of the conversion cannot be represented in the object, the behavior is undefined". So the `scanf` family are, for the most part, not suitable for use in production – M.M Feb 07 '16 at 22:30
  • I recall in my introductory programming class many years ago the first assignment was to write a program to add two numbers, which could be positive or negative. I dutifully wrote the code, then realized there could be overflow and underflow, so I then wrote code to detect that and inform the user if it occurred. I imagine a similar thing could be done to satisfy your second question. – Michael Feb 08 '16 at 01:10

7 Answers7

10

If the result of imaxabs cannot be represented, can happen if using two's complement, then the behavior is undefined.

7.8.2.1 The imaxabs function

  1. The imaxabs function computes the absolute value of an integer j. If the result cannot be represented, the behavior is undefined. 221)

221) The absolute value of the most negative number cannot be represented in two’s complement.

The check that makes no assumptions and is always defined is:

intmax_t i = ... ;
if( i < -INTMAX_MAX )
{
    //handle error
}

(This if statement cannot be taken if using one's complement or sign-magnitude representation, so the compiler might give a unreachable code warning. The code itself is still defined and valid. )

2501
  • 25,460
  • 4
  • 47
  • 87
  • Thanks for a good solution, it was a hard choice, but after some deliberation I still chose to accept the other answer, which shows how to print the correct result. – hyde Feb 07 '16 at 16:05
  • 2
    @hyde Except that the other answer is not standards compliant while this one is. – Voo Feb 07 '16 at 17:05
  • Is it guaranteed that `-INTMAX_MAX` doesn't overflow? – nwellnhof Feb 07 '16 at 17:15
  • @nwellnhof It is guaranteed. See my other comment: https://stackoverflow.com/questions/35251410/c-safely-taking-absolute-value-of-integer/35251846?noredirect=1#comment58217467_35251501 – 2501 Feb 07 '16 at 17:17
7

How would a pedantic programmer go about guarding against this, without making any assumptions not guaranteed by standard?

One method is to use unsigned integers. The overflow behaviour of unsigned integers is well-defined as is the behaviour when converting from a signed to an unsigned integer.

So I think the following should be safe (turns out it's horriblly broken on some really obscure systems, see later in the post for an improved version)

uintmax_t j = i;
if (j > (uintmax_t)INTMAX_MAX) {
  j = -j;
}
printf("Result: |%jd| = %ju\n", i, j);

So how does this work?

uintmax_t j = i;

This converts the signed integer into an unsigned one. IF it's positive the value stays the same, if it's negative the value increases by 2n (where n is the number of bits). This converts it to a large number (larger than INTMAX_MAX)

if (j > (uintmax_t)INTMAX_MAX) {

If the original number was positive (and hence less than or equal to INTMAX_MAX) this does nothing. If the original number was negative the inside of the if block is run.

  j = -j;

The number is negated. The result of a negation is clearly negative and so cannot be represented as an unsigned integer. So it is increased by 2n.

So algebraically the result for negative i looks like

j = - (i + 2n) + 2n = -i


Clever, but this solution makes assumptions. This fails if INTMAX_MAX == UINTMAX_MAX, which is allowed by C Standard.

Hmm, lets look at this (i'm reading https://busybox.net/~landley/c99-draft.html which is apprarently the last C99 draft prior to standardisation, if anything changed in the final standard please do tell me.

When typedef names differing only in the absence or presence of the initial u are defined, they shall denote corresponding signed and unsigned types as described in 6.2.5; an implementation shall not provide a type without also providing its corresponding type.

In 6.2.5 I see

For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements.

In 6.2.6.2 I see

#1

For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N-1, so that >objects of that type shall be capable of representing values from 0 to 2N-1 >using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.39)

#2

For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; there shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M<=N). If the sign bit is zero, it shall not affect the resulting value.

So yes it seems you are right, while the signed and unsigned types have to be the same size it does seem to be valid for the unsigned type to have one more padding bit than the signed type.


Ok, based on the analysis above revealing a flaw in my first attempt i've written a more paranoid variant. This has two changes from my first version.

I use i < 0 rather than j > (uintmax_t)INTMAX_MAX to check for negative numbers. This means that the algorithm proceduces correct results for numbers grater than or equal to -INTMAX_MAX even when INTMAX_MAX == UINTMAX_MAX.

I add handling for the error case where INTMAX_MAX == UINTMAX_MAX, INTMAX_MIN == -INTMAX_MAX -1 and i == INTMAX_MIN. This will result in j=0 inside the if condition which we can easilly test for.

It can be seen from the requirements in the C standard that INTMAX_MIN cannot be smaller than -INTMAX_MAX -1 since there is only one sign bit and the number of value bits must be the same or lower than in the corresponding unsigned type. There are simply no bit patterns left to represent smaller numbers.

uintmax_t j = i;
if (i < 0) {
  j = -j;
  if (j == 0) {
    printf("your platform sucks\n");
    exit(1);
  }
}
printf("Result: |%jd| = %ju\n", i, j);

@plugwash I think 2501 is correct. For example, -UINTMAX_MAX value becomes 1: (-UINTMAX_MAX + (UINTMAX_MAX + 1)), and is not caught by your if. – hyde 58 mins ago

Umm,

assuming INTMAX_MAX == UINTMAX_MAX and i = -INTMAX_MAX

uintmax_t j = i;

after this command j = -INTMAX_MAX + (UINTMAX_MAX + 1) = 1

if (i < 0) {

i is less than zero so we run the commands inside the if

j = -j;

after this command j = -1 + (UINTMAX_MAX + 1) = UINTMAX_MAX

which is the correct answer, so no need to trap it in an error case.

plugwash
  • 9,724
  • 2
  • 38
  • 51
  • I choose to accept this, as this will actually display correct result even for `INTMAX_MIN` value. – hyde Feb 07 '16 at 16:01
  • 5
    Clever, but this solution makes assumptions. This fails if `INTMAX_MAX == UINTMAX_MAX`, which is allowed by C Standard. – 2501 Feb 07 '16 at 16:21
  • @2501 Is that possible? I'm under impression, *which could be wrong*, that casting signed to corresponding unsigned type must not lose bits, and therefore if signed value is negative, the produced unsigned value must be more than signed maximum. – hyde Feb 07 '16 at 16:48
  • 3
    @hyde The paragraph *C11 6.2.6.2, p2* says there may be the same number of value bits in an unsigned integer than there are in the corresponding signed integer.(note: M<=N). In which case the range of the signed integer is actually bigger, because the signed integer has an extra sign bit, that gives it the negative range – 2501 Feb 07 '16 at 16:52
  • @2501 In any case, this will still not cause undefined behaviour, just wrong result (`INTMAX_MAX-1`, if I did my bit math correctly?), which could be tested for prior, or sometimes just ignored as "close enough" depending on situation. – hyde Feb 07 '16 at 16:59
  • Another question, are there any 2's complement CPUs, which also have `INTMAX_MAX == UINTMAX_MAX`? (I note that this is out of the scope of my question, which specifically asks for things guaranteed by the standard). – hyde Feb 07 '16 at 17:01
  • 1
    @hyde 1. Yes, just the wrong result. 2. I don't know of any. :), I think it is more of a theoretical concern. You can always add an #ifdef for that unlikely scenario, and use this code if you like. – 2501 Feb 07 '16 at 17:14
  • @2501 I added a more paranoid version at the end of my post including error reporting for the case where the absolute value of a INTMAX_MIN cannot be represented in uintmax_t . Does it look right to you now or are there more holes you can pick? – plugwash Feb 07 '16 at 17:19
  • @plugwash Yeah, my previous three comments aren't correct for those values, so I removed them. – 2501 Feb 07 '16 at 18:03
  • The language lawyer style code in this example is exactly the kind of fragile, hard to understand tricky coding one should avoid!! There's far simpler, more obviously correct solutions which special case the problem range (in this case i == INTMAX_MIN). – Rob11311 Feb 07 '16 at 18:32
  • @2501 I don't think you are correct. The original version of my code relied on j=i producing distinct results for positive and negative numbers but the "more paranoid" version changed the test to (i<0) so positive and negative numbers no longer need to produce distinct values for j. – plugwash Feb 07 '16 at 18:34
  • @plugwash Yes, you are correct, I was testing with the incorrect version, and I also had an incorrectly placed brackets in my test. Sorry. :-( – 2501 Feb 07 '16 at 23:41
4

On two-complement systems getting the absolute number of the most negative value is indeed undefined behavior, as the absolute value would be out of range. And it's nothing the compiler can help you with, as the UB happens at run-time.

The only way to protect against that is to compare the input against the most negative value for the type (INTMAX_MIN in the code you show).

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • 1
    This covers two's complement (and loses only one valid number for one's complement), but I find it a good question if it can be detected in a reliable way no matter the integer representation (I presume the standard doesn't restrict to only one's and two's complement although I must admit I've never checked) – Joachim Isaksson Feb 07 '16 at 08:57
  • 6
    @JoachimIsaksson: The standard restricts to one of three options: two's complement, ones' complement and sign-magnitude. (C99, 6.2.6.2, para 2.) – Mark Dickinson Feb 07 '16 at 09:01
  • 3
    @JoachimIsaksson `if( i < -INTMAX_MAX )` works for any representation. Although you might get a compiler warning on one's complement and sign-magnitude since the statement cannot be taken. I don't know how to prevent that. – 2501 Feb 07 '16 at 09:29
  • "And it's nothing the compiler can help you with, as the UB happens at run-time." the compiler could produce code that performs run-time checks ;-) – coredump Feb 07 '16 at 10:02
  • @skyking Standard doesn't define that type. Can you quite the standard, so it is clearer what you meant.? Or better yet post an answer as a rebuttal to mine. (I'm interested, perhaps I'm wrong, but I don't see it.) – 2501 Feb 07 '16 at 10:31
  • 4
    @skyking According to C, for any signed type, -MAX must be representable: *C11 6.2.6.2, p2*, because signed integers must be one of those three representations, which guarantee those ranges. It is impossible for a signed integer to have its maximum value be larger than the absolute minimum value. – 2501 Feb 07 '16 at 10:43
2

So calculating the absolute value of an integer invokes undefined behaviour in one single case. Actually, while the undefined behaviour can be avoided, it is impossible to give the correct result in one case.

Now consider multiplication of an integer by 3: Here we have a much more serious problem. This operation invokes undefined behaviour in 2/3rds of all cases! And for two thirds of all int values x, finding an int with the value 3x is just impossible. That's a much more serious problem than the absolute value problem.

gnasher729
  • 51,477
  • 5
  • 75
  • 98
1

You may want to use some bit hacks:

int v;           // we want to find the absolute value of v
unsigned int r;  // the result goes here 
int const mask = v >> sizeof(int) * CHAR_BIT - 1;

r = (v + mask) ^ mask;

This works well when INT_MIN < v <= INT_MAX. In the case where v == INT_MIN, it remains INT_MIN , without causing undefined behavior.

You can also use bitwise operation to handle this on ones' complement and sign-magnitude systems.

Reference: https://graphics.stanford.edu/~seander/bithacks.html#IntegerAbs

nalzok
  • 14,965
  • 21
  • 72
  • 139
  • I believe right shifting a signed integer is in itself UB. – abligh Feb 07 '16 at 10:52
  • 1
    @abligh If the signed integer is negative, it is implementation defined. This answer also assumes no padding bits. – 2501 Feb 07 '16 at 10:54
  • According to the bit hacks file, this branchless solution relies on 2's complement, but has also been patented in the US, which may also be a problem. – Rob11311 Feb 07 '16 at 18:51
0

according to this http://linux.die.net/man/3/imaxabs

Notes

Trying to take the absolute value of the most negative integer is not defined.

To handle the full range you could add something like this to your code

    if (i != INTMAX_MIN) {
        printf("Result: |%jd| = %jd\n", i, imaxabs(i));
    } else {  /* Code around undefined abs( INTMAX_MIN) /*
        printf("Result: |%jd| = %jd%jd\n", i, -(i/10), -(i%10));
    }

edit: As abs(INTMAX_MIN) cannot be represented on a 2's complement machine, 2 values within respresentable range are concatenated on output as a string. Tested with gcc, though printf required %lld as %jd was not a supported format.

Community
  • 1
  • 1
Ilan Kutsman
  • 469
  • 3
  • 9
  • what is `imax(i+1)+1` and what is it supposed to achieve? – Pascal Cuoq Feb 07 '16 at 09:54
  • i meant to write imaxabs., i'll fix it. it's supposed to give the correct result of an absolute of INTMAX_MIN. Just trying to think out of the box here – Ilan Kutsman Feb 07 '16 at 09:58
  • 1
    `imaxbas(i+1)+1` is not a workaround, it just pushes the undefined behavior into the second addition. The fundamental reason `imaxabs(INTMAX_MIN)` is undefined on 2's complement machine is that the correct result is not representable. No amount of adding one twice will change this basic fact. – Pascal Cuoq Feb 07 '16 at 10:03
  • OK, slight change ,imaxabs(INTMAX_MIN+1) is representable by a 2's complement machine.right? Now you turn it to a string and increment the last char before '\0'. – Ilan Kutsman Feb 07 '16 at 10:13
  • 1
    It's easier to use div & mod to put INTMAX_MIN into a negateable range though – Rob11311 Feb 07 '16 at 18:39
-1
  1. Is this really undefined behaviour, as in "code is allowed to trigger any code path, which any code that stroke compiler's fancy", when user enters the bad number? Or is it some other flavor of not-completely-defined?

The behaviour of the program is only undefined, when the bad number is successfully input-ed and passed to imaxabs(), which on a typical 2's complement system returns a -ve result as you observed.

That is the undefined behaviour in this case, the implementation would also be allowed to terminate the program with an over-flow error if the ALU set status flags.

The reason for "undefined behaviour" in C is so compiler writers don't have to guard against overflow, so programs can run more efficiently. Whilst it is within C standard for every C program using abs() to try to kill your first born, just because you call it with a too -ve value, writing such code into the object file would simply be perverse.

The real problem with these undefined behaviours, is that an optimising compiler, can reason away naive checks so code like :

r = (i < 0) ? -i : i;
if (r < 0) {   // This code may be pointless
    // Do overflow recovery
    doRecoveryProcessing();
} else {
    printf("%jd", r);
}

As a compiler optomiser can reason that negative values are negated, it could in principal determine that (r <0) is always false, so the attempt to trap the problem fails.

  1. How would a pedantic programmer go about guarding against this, without making any assumptions not guaranteed by standard?

By far the best way, is simply to ensure that the program works on a valid range, so in this case validating the input suffices (disallow INTMAX_MIN). Programs printing tables of abs() ought to avoid INT*_MIN and so on.

    if (i != INTMAX_MIN) {
        printf("Result: |%jd| = %jd\n", i, imaxabs(i));
    } else {  /* Code around undefined abs( INTMAX_MIN) /*
        printf("Result: |%jd| = %jd%jd\n", i, -(i/10), -(i%10));
    }

Appears to write out the abs( INTMAX_MIN) by fakery, allowing the program to live up to it's promise to the user.

Rob11311
  • 1,396
  • 8
  • 10