5

Per the specification of strtol:

If the subject sequence has the expected form and the value of base is 0, the sequence of characters starting with the first digit shall be interpreted as an integer constant. If the subject sequence has the expected form and the value of base is between 2 and 36, it shall be used as the base for conversion, ascribing to each letter its value as given above. If the subject sequence begins with a minus-sign, the value resulting from the conversion shall be negated. A pointer to the final string shall be stored in the object pointed to by endptr, provided that endptr is not a null pointer.

The issue at hand is that, prior to the negation, the value is not in the range of long. For example, in C89 (where the integer constant can't take on type long long), writing -2147483648 is possibly an overflow; you have to write (-2147483647-1) or similar.

Since the wording using "integer constant" could be interpreted to apply the C rules for the type of an integer constant, this might be enough to save us from undefined behavior here, but the same issue (without such an easy out) would apply to strtoll.

Finally, note that even if it did overflow, the "right" value should be returned. So this question is really just about whether errno may or must be set in this case.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 1
    Why don't you update the question to be something that can't be determined by just running the code - instead of just putting an edit at the bottom. – xaxxon Jun 08 '13 at 20:14
  • 1
    Questions like this inherently can't be answered just by running the code. It's a question on the requirements of the C language (see the `language-lawyer` tag). – R.. GitHub STOP HELPING ICE Jun 08 '13 at 20:19
  • This question is raised in comp.std.c here https://groups.google.com/d/msg/comp.std.c/KOVzuLFen6Q/x2laO7KPCJ4J and C committee member Lawrence Jones says `errno` is not set to `ERANGE`. – ouah Jun 08 '13 at 20:41
  • @R.. I was suggesting you change the title of your question to be something that can't be done by testing which you only put in an edit at the bottom. – xaxxon Jun 08 '13 at 21:56
  • The title is correct usage of the word overflow. In most cases, "does X overflow?" can be answered in the positive but not in the negative by testing; since overflow usually results in undefined behavior, a test can never establish that overflow did not happen. In the case of `strtol`, the "overflow" would be internal to the implementation if it happens, and the function is required to report that via `errno`, so this should be testable in either direction, but then the test is not answering a question about the C language but about a particular implementation which may or may not be correct. – R.. GitHub STOP HELPING ICE Jun 09 '13 at 13:56
  • IMO, the "value" in "... the value resulting from the conversion shall be negated." is taken in the arithmetic sense, not confined to the `int` range. Thus the issue of overflow and `errno` does not apply until the final result is obtained. Nice Q – chux - Reinstate Monica Jul 04 '14 at 23:20

3 Answers3

6

Although I cannot point to a particular bit of wording in the standard today, when I wrote strtol for 4BSD back in the 1990s I was pretty sure that this should not set errno, and made sure that I would not. Whether this was based on wording in the standard, or personal discussion with someone, I no longer recall.

In order to avoid overflow, this means the calculation has to be done pretty carefully. I did it in unsigned long and included this comment (still in the libc source in the various BSDs):

    /*
     * Compute the cutoff value between legal numbers and illegal
     * numbers.  That is the largest legal value, divided by the
     * base.  An input number that is greater than this value, if
     * followed by a legal input character, is too big.  One that
     * is equal to this value may be valid or not; the limit
     * between valid and invalid numbers is then based on the last
     * digit.  For instance, if the range for longs is
     * [-2147483648..2147483647] and the input base is 10,
     * cutoff will be set to 214748364 and cutlim to either
     * 7 (neg==0) or 8 (neg==1), meaning that if we have accumulated
     * a value > 214748364, or equal but the next digit is > 7 (or 8),
     * the number is too big, and we will return a range error.
     *
     * Set 'any' if any `digits' consumed; make it negative to indicate
     * overflow.
     */

I was (and still am, to some extent) annoyed by the asymmetry between this action in the C library and the syntax of the language itself (where negative numbers are two separate tokens, - followed by the number, so that writing -217483648 means -(217483648) which becomes -(217483648U) which is of course 217483648U and hence positive! (Assuming 32-bit int of course; the problematic value varies for other bit sizes.)

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
torek
  • 448,244
  • 59
  • 642
  • 775
1

On a 32-bit platform, -2147483648 is not an overflow under C89. It's LONG_MIN for and errno == 0.

Quoting directly from the standard

RETURN VALUE

Upon successful completion strtol() returns the converted value, if any. If no conversion could be performed, 0 is returned and errno may be set to [EINVAL]. If the correct value is outside the range of representable values, LONG_MAX or LONG_MIN is returned (according to the sign of the value), and errno is set to [ERANGE].

When tested, this seems to be in line with the following test:

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <limits.h>

int main(int argc, char *argv[]) {
    long val = strtol(argv[1], NULL, 10);
    fprintf(stderr, "long max: %ld, long min: %ld\n", LONG_MAX, LONG_MIN);
    fprintf(stderr, "val: %ld, errno: %d\n", val, errno);
    perror(argv[1]);
    return 0;
}

When compiled as this on a 32-bit x86 system using:

gcc -std=c89 foo.c -o foo

produces the following outputs:

./foo -2147483648

Output:

long max: 2147483647, long min: -2147483648
val: -2147483648, errno: 0
-2147483648: Success

./foo -2147483649

Output:

long max: 2147483647, long min: -2147483648
val: -2147483648, errno: 34
-2147483649: Numerical result out of range
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Ahmed Masud
  • 21,655
  • 3
  • 33
  • 58
  • What do you mean by *"It's LONG_MIN for and errno == 0"* (seems incomprehensible)? E.g., is a word missing? Please respond by editing your answer, not here in comments (but ********* ***without*** ********* "Edit:", "Update:", or similar - the answer should appear as if it was written today). – Peter Mortensen Sep 20 '22 at 13:42
1

Based on the comp.std.c thread cited in a comment by ouah (9 years ago), the intent is clearly that it does not overflow. The actual language in the standard is still ambiguous:

If the subject sequence has the expected form and the value of base is zero, the sequence of characters starting with the first digit is interpreted as an integer constant according to the rules of 6.4.4.1. If the subject sequence has the expected form and the value of base is between 2 and 36, it is used as the base for conversion, ascribing to each letter its value as given above. If the subject sequence begins with a minus sign, the value resulting from the conversion is negated (in the return type).

In order to get the right behavior, you have to interpret the phrase "interpreted as an integer constant according to the rules of 6.4.4.1" as yielding an actual integer value, not a value within some C-language integer type, and the final "in the return type" as the negation happening with a typeless integer value as the operand, but a coerced type for the result.

Moreover, the error condition does not actually even define an "overflow" condition, but "correct value outside the range". This part of the text seems to be ignoring the unsigned issue addressed in DR006, since it only deals with the final value, not the pre-negation value:

If the correct value is outside the range of representable values, LONG_MIN, LONG_MAX, LLONG_MIN, LLONG_MAX, ULONG_MAX, or ULLONG_MAX is returned (according to the return type and sign of the value, if any), and the value of the macro ERANGE is stored in errno.

In short, this seems to still be a mess, due to the usual outcome where the committee says "yeah, it's supposed to mean what you think it should mean" and then never updates the ambiguous or outright wrong text in the standard...

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • *ICE* probably means [U.S. Immigration and Customs Enforcement](https://en.wikipedia.org/wiki/U.S._Immigration_and_Customs_Enforcement). – Peter Mortensen Feb 08 '23 at 04:44