41

The program below converts a string to long, but based on my understanding it also returns an error. I am relying on the fact that if strtol successfully converted string to long, then the second parameter to strtol should be equal to NULL. When I run the below application with 55, I get the following message.

./convertToLong 55
Could not convert 55 to long and leftover string is: 55 as long is 55

How can I successfully detect errors from strtol? In my application, zero is a valid value.

Code:

#include <stdio.h>
#include <stdlib.h>

static long parseLong(const char * str);

int main(int argc, char ** argv)
{
    printf("%s as long is %ld\n", argv[1], parseLong(argv[1]));
    return 0;
 }

static long parseLong(const char * str)
{
    long _val = 0;
    char * temp;

    _val = strtol(str, &temp, 0);

    if(temp != '\0')
            printf("Could not convert %s to long and leftover string is: %s", str, temp);

    return _val;
}
Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39
Jimm
  • 8,165
  • 16
  • 69
  • 118
  • 1
    Read the documentation again; you also should handle errors like overflow. – Kerrek SB Jan 05 '13 at 20:36
  • 1
    Also, the proper error checking for `strto*` functions is not done by checking the output pointer. It should be done by checking for a zero return value and a set `errno`. –  Jan 05 '13 at 20:37
  • 2
    Why don't you use `std::stoi` in C++ ? (you added the C++ tag) – BatchyX Jan 05 '13 at 20:39
  • @BatchyX, It won't work quite as well for strings like "123abc" (as was the consensus in [my previous question](http://stackoverflow.com/questions/11598990/is-stdstoi-actually-safe-to-use)). The OP is checking for the entire string to be converted. – chris Jan 05 '13 at 20:48
  • @chris: You can do exactly the same thing with `std::stoi`. In fact, the prototype of `stoi` is almost the same as `strtol`, but uses exceptions where exceptions are due, instead of an error return value with global error variable hackery. – BatchyX Jan 05 '13 at 20:53
  • @BatchyX, True, but it's really annoying trying to see if the whole string was converted. I'd expect implementations to use `strtol` under the hood anyway, as one exception is based on a reported failure from `strtol`, but completely leave out converting the whole string in the checking. I find `boost::lexical_cast` a good substitute for that behaviour, though people have made a case against it as well. – chris Jan 05 '13 at 20:58
  • @chris: come on... doing that with strtoi is just `if (*pos != string.length()) throw std::invalid_argument();`, and it will reuse your `invalid_argument` exception handler. And sometimes, you ẁant to accept unconverted string if it begins with a space.. – BatchyX Jan 05 '13 at 21:07
  • @BatchyX, Whatever works. I'm just surprised it doesn't do that in the first place, so you have to add your own code onto it if you want that functionality. – chris Jan 05 '13 at 21:19

5 Answers5

78

Note that names beginning with an underscore are reserved for the implementation; it is best to avoid using such names in your code. Hence, _val should be just val.

The full specification of error handling for strtol() and its relatives is complex, surprisingly complex, when you first run across it. One thing you're doing absolutely right is using a function to invoke strtol(); using it 'raw' in code is probably not correct.

Since the question is tagged with both C and C++, I will quote from the C2011 standard; you can find the appropriate wording in the C++ standard for yourself.

ISO/IEC 9899:2011 §7.22.1.4 The strtol, strtoll, strtoul and strtoull functions

long int strtol(const char * restrict nptr, char ** restrict endptr, int base);

¶2 [...] First, they decompose the input string into three parts: an initial, possibly empty, sequence of white-space characters (as specified by the isspace function), a subject sequence resembling an integer represented in some radix determined by the value of base, and a final string of one or more unrecognized characters, including the terminating null character of the input string. [...]

¶7 If the subject sequence is empty or does not have the expected form, no conversion is performed; the value of nptr is stored in the object pointed to by endptr, provided that endptr is not a null pointer.

Returns

¶8 The strtol, strtoll, strtoul, and strtoull functions return the converted value, if any. If no conversion could be performed, zero is returned. If the correct value is outside the range of representable values, LONG_MIN, LONG_MAX, LLONG_MIN, LLONG_MAX, ULONG_MAX, or ULLONG_MAX is returned (according to the return type and sign of the value, if any), and the value of the macro ERANGE is stored in errno.

Remember that no standard C library function ever sets errno to 0. Therefore, to be reliable, you must set errno to zero before calling strtol().

So, your parseLong() function might look like:

static long parseLong(const char *str)
{
    errno = 0;
    char *temp;
    long val = strtol(str, &temp, 0);

    if (temp == str || *temp != '\0' ||
        ((val == LONG_MIN || val == LONG_MAX) && errno == ERANGE))
        fprintf(stderr, "Could not convert '%s' to long and leftover string is: '%s'\n",
                str, temp);
        // cerr << "Could not convert '" << str << "' to long and leftover string is '"
        //      << temp << "'\n";
    return val;
}

Note that on error, this returns 0 or LONG_MIN or LONG_MAX, depending on what strtol() returned. If your calling code needs to know whether the conversion was successful or not, you need a different function interface — see below. Also, note that errors should be printed to stderr rather than stdout, and error messages should be terminated by a newline \n; if they're not, they aren't guaranteed to appear in a timely fashion.

Now, in library code you probably do not want any printing, and your calling code might want to know whether the conversion was successful of not, so you might revise the interface too. In that case, you'd probably modify the function so it returns a success/failure indication:

bool parseLong(const char *str, long *val)
{
    char *temp;
    bool rc = true;
    errno = 0;
    *val = strtol(str, &temp, 0);

    if (temp == str || *temp != '\0' ||
        ((*val == LONG_MIN || *val == LONG_MAX) && errno == ERANGE))
        rc = false;

    return rc;
}

which you could use like:

if (parseLong(str, &value))
    …conversion successful…
else
    …handle error…

If you need to distinguish between 'trailing junk', 'invalid numeric string', 'value too big' and 'value too small' (and 'no error'), you'd use an integer or enum instead of a boolean return code. If you want to allow trailing white space but no other characters, or if you don't want to allow any leading white space, you have more work to do in the function. The code allows octal, decimal and hexadecimal; if you want strictly decimal, you need to change the 0 to 10 in the call to strtol().

If your functions are to masquerade as part of the standard library, they should not set errno to 0 permanently, so you'd need to wrap the code to preserve errno:

int saved = errno;  // At the start, before errno = 0;

…rest of function…

if (errno == 0)     // Before the return
    errno = saved;
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 1
    Thanks for the extensive answer! But why do you explicitly check for "errno == ERANGE" instead of "errno != 0"? If the user could specify an own base for conversion, errno could also be set to EINVAL... Also, "man strtol" (http://linux.die.net/man/3/strtol) uses the following code for error checking, and I really don't get the reason for this: "if ((errno == ERANGE && (val == LONG_MAX || val == LONG_MIN)) || (errno != 0 && val == 0)){ error }". Why isn't this a simple "errno != 0" as well? – oliver Mar 17 '14 at 13:20
  • 4
    The standard doesn't mention setting `errno == EINVAL` for values of `base` other than `0` or `2`..`36`, but it is a reasonable thing to do. In general, you should be cautious about trying to detect error conditions with `errno` rather than the return from a function; the library can set `errno` to a non-zero value even if the function succeeds. (On Solaris, if the output was not a terminal, you'd find `errno == ENOTTY` after a successful operation.) In theory, `strtol()` could convert `"1"` to `1` and set `errno` to a non-zero value and this would be legitimate but perverted (and successful). – Jonathan Leffler Mar 17 '14 at 14:26
  • 2
    Is there a reason `errno == ERANGE` is checked unconditionally, whether `strtol` returned `LONG_MIN`/`LONG_MAX` or not? (For the reason you give in the comment, a library function may set `errno` on success.) – mafso Sep 28 '14 at 09:10
  • @mafso: Originally, some variation on the theme of exhaustion, laziness or carelessness. I've updated the answer to address your valid point, and miscellaneous other minor issues (spelling, etc). – Jonathan Leffler Sep 28 '14 at 21:59
  • 1
    There's an error in your example. `val` is a `long int *`, but you do the check `val == LONG_MIN`, it should be `*val == LONG_MIN`... – Joakim Dec 11 '14 at 00:24
  • @Joakim: Thanks! You're right; the second example was using `val` where `*val` was necessary. I've compiled the amended version of both samples under stringent warning options; they're probably OK now. – Jonathan Leffler Dec 11 '14 at 02:37
  • Why check `temp == str || *temp != '\0'` ? Isn't the check `temp == str` already covered? (If `temp` points to `str`, then it is not `'\0'` unless `str` is a null-string, in which case the conversion would also have failed... Or am I missing something? – BmyGuest Oct 27 '16 at 08:22
  • @BmyGuest: Two different failure modes. If `temp == str`, then there was nothing in the string that was recognizable as a long. If `*temp != '\0'`, then there was a number, but it did not use the whole string — there was some other character after the number. You're at liberty to decide that it isn't a problem if you are given `"19Z"` and the `Z` isn't convertible; the test shown assumes it is a problem (the 'trailing junk' mentioned in the answer). But good question: it is important to understand what you're using. – Jonathan Leffler Oct 27 '16 at 13:15
  • 3
    Disagree with "the library can set errno to a non-zero value even if the function succeeds." C11 §7.5 3 discuses that but that does not apply to `strtol()` because "provided the use of errno is not documented in the description of the function" which `strtol()` does. `if (temp == str || *temp != '\0' || errno == ERANGE)` is sufficient . IMO `if (temp == str || *temp != '\0' || errno)` is better as it catches some ID extensions. The `(*val == LONG_MIN || *val == LONG_MAX)` are not needed. – chux - Reinstate Monica Apr 26 '17 at 19:34
  • @chux: That comment is subject to the 'In general' prefix; you're right that it doesn't apply when the use of `errno` is specified (so it doesn't apply to `strtol()`) and I don't explicitly say so. It gets tricky when the C standard only says `ERANGE` but some implementations might set `EINVAL` instead when the base is invalid. It's undefined behaviour to call the function with invalid values; you get what you get (setting the output pointer to the input pointer and returning 0 and setting `errno` to `EINVAL` is all reasonable if `base` is not `0` or `2`..`36`). – Jonathan Leffler Apr 26 '17 at 19:39
  • 2
    @JonathanLeffler Agree about `EINVAL` and so the suggested `temp == str || *temp != '\0' || errno` - I think we agree well there. Yet the comment is about the need for `*val == LONG_MIN || *val == LONG_MAX`, which is not enhanced given the _other_ `errno` possibilities. If `errno == ERANGE` is true, then even if `*val == LONG_MIN || *val == LONG_MAX` was false on some unicorn machine, the `strtol()` should still be consider as failed. – chux - Reinstate Monica Apr 26 '17 at 19:53
  • "Note that names beginning with an underscore are reserved for the implementation; it is best to avoid using such names in your code. Hence, _val should be just val." This isn't quite true AFAIK. The standard reserves names beginning with an underscore followed by either an underscore or a capital letter. So `__val` and `_Val` are reserved, but `_val` is not. – celticminstrel Dec 05 '22 at 03:13
  • @celticminstrel: For C, part of [C11 §7.1.3 Reserved identifiers](https://port70.net/~nsz/c/c11/n1570.html#7.1.3) says: — _All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use._ — _All identifiers that begin with an underscore are always reserved for use as identifiers with file scope in both the ordinary and tag name spaces._ See also [What does double underscore (`__const`) mean in C?](https://stackoverflow.com/q/1449181) Yes, you can use names that start with underscores; no, you won't always get away with it. – Jonathan Leffler Dec 05 '22 at 05:37
  • That sounds like it's safe to use an initial underscore followed by a lowercase character or a digit for local variables or class variables, but not for global file-static variables or variables in an anonymous namespace. I bet `std::placeholders` is one of the main reasons for that second rule… – celticminstrel Jan 23 '23 at 02:11
22

You're almost there. temp itself will not be null, but it will point to a null character if the whole string is converted, so you need to dereference it:

if (*temp != '\0')
chris
  • 60,560
  • 13
  • 143
  • 205
  • 5
    Additional checks are needed to handle overflows and parsing an empty string. See Jonathan Leffler's answer. – 0xF Feb 14 '14 at 11:44
7

How can I successfully detect errors from strtol?

static long parseLong(const char * str) {
    int base = 0;
    char *endptr;
    errno = 0;
    long val = strtol(str, &endptr, base);

3 tests specified/supported by the standard C library:

  1. Any conversion done?

     if (str == endptr) puts("No conversion.");
    
  2. In range?

     // Best to set errno = 0 before the strtol() call.
     else if (errno == ERANGE) puts("Input out of long range.");
    
  3. Tailing junk?

     else if (*endptr) puts("Extra junk after the numeric text.");
    

Success

    else printf("Success %ld\n", val);

Input like str == NULL or base not 0, [2 to 36] is undefined behavior. Various implementations (extensions to the C library) provide defined behavior and report via errno. We could add a 4th test.

    else if (errno) puts("Some implementation error found.");

Or combine with the errno == ERANGE test.


Sample terse code that also takes advantage of common implementation extensions.

long my_parseLong(const char *str, int base, bool *success) {
    char *endptr = 0;
    errno = 0;
    long val = strtol(str, &endptr, base);
   
    if (success) {
      *success = endptr != str && errno == 0 && endptr && *endptr == '\0';
    }
    return val;
}
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
4

You're missing a level of indirection. You want to check whether the character is the terminating NUL, and not if the pointer is NULL:

if (*temp != '\0')

By the way, this is not a good approach for error checking. The proper error checking method of the strto* family of functions is not done by comparing the output pointer with the end of the string. It should be done by checking for a zero return value and getting the return value of errno.

1

You should be checking

*temp != '\0'

You should also be able to check the value of errno after calling strotol according to this:

RETURN VALUES
     The strtol(), strtoll(), strtoimax(), and strtoq() functions return the result
     of the conversion, unless the value would underflow or overflow.  If no conver-
     sion could be performed, 0 is returned and the global variable errno is set to
     EINVAL (the last feature is not portable across all platforms).  If an overflow
     or underflow occurs, errno is set to ERANGE and the function return value is
     clamped according to the following table.


       Function       underflow     overflow
       strtol()       LONG_MIN      LONG_MAX
       strtoll()      LLONG_MIN     LLONG_MAX
       strtoimax()    INTMAX_MIN    INTMAX_MAX
       strtoq()       LLONG_MIN     LLONG_MAX
spartygw
  • 3,289
  • 2
  • 27
  • 51
  • Citing from "the following table" does not make sense if you don't say where the "following table" can be found. – Roland Illig Oct 31 '20 at 19:00
  • Did you write this documentation yourself, or did you just forget to mention the source you copied it from? – Roland Illig Nov 04 '20 at 05:57
  • No it's a man page. Just "man strtol" on any unix based system. – spartygw Nov 04 '20 at 20:23
  • I'm just asking since the NetBSD man page looks quite different, even though it is a UNIX-like system. – Roland Illig Nov 05 '20 at 15:00
  • 1
    Furthermore, the question is tagged as "C, C++", therefore the proper reference is from the C or C++ standard, not from a particular implementation on a particular hardware architecture. – Roland Illig Nov 05 '20 at 15:01