33

I'm trying to figure out from a command line being parsed, which function would be best to convert either a decimal, hexadecimal, or octal number to an int the best — without knowing the input beforehand.

The goal then is to use a single function that recognizes the different types of inputs and assign that to its integer (int) value which can then be used so:

./a.out 23 0xC4 070

could print

23
196 /*hexadecimal*/
56  /*octal*/

The only issue that I can see is the parsing to find the difference between a decimal integer and an octal.

Side question, is this stable for converting the string to an integer for use?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
user3362954
  • 1,221
  • 2
  • 15
  • 23
  • 8
    `atoi()` and `atol()` are very limited in relation to error recovery; `sscanf()` is too complex; use `strtol()` or `strtoul()`. – pmg Apr 04 '14 at 14:33
  • 1
    A leading 0 (zero) on an integer constant means octal; a leading 0x or 0X means hexadecimal, you should stick to this type of input if you could. – tesseract Apr 04 '14 at 14:33
  • 2
    "without knowing the input before hand" Do you mean you don't know the respective bases of the numbers? There's no general way to infer from the digits of a number what base its in. Eg, "70" could be base 8 or base 10 or base 16. – alecbz Apr 04 '14 at 14:34
  • 2
    Without information on what a representation *means*, nobody can interpret it. Usually we use cultural folklore to communicate this information (e.g. "leading 0 for octal"); if you want something that diverges from that, you're a) going to upset everyone and b) have to write your own code. – Kerrek SB Apr 04 '14 at 14:35
  • Sorry, I was unsure about the leading '0' or '0x' for octal and hexidecimal, I do not intend on diverging from that. – user3362954 Apr 04 '14 at 14:38
  • 3
    If you pass in `0` as the `base` argument to `strtol` or `strtoul`, they will detect which base the input is in, but only to the extent that it's possible to tell, using the prefixes that tesseract mentioned. So if you want `70` to be treated as octal, you need to use `070`, and then `strtoul("070", NULL, 0)` will return 56 (decimal). I'm pretty sure `strtoul("C4", NULL, 0)` will return 196, despite the lack of a `0x` prefix, but only because the `C` gave away the fact that it was hexadecimal. – Mike Holt Apr 04 '14 at 14:41
  • 2
    @MikeHolt You should post that as an answer. I'm afraid `"C4"` won't be converted as hex though, the prefix is required. The rules are the same as for integer literals in source, which kind of makes sense. – unwind Apr 04 '14 at 14:44
  • @MikeHolt: no, it needs "0xC4" or "0XC4" – ysth Apr 04 '14 at 14:44
  • Yes, you're right. I just checked. – Mike Holt Apr 04 '14 at 14:46
  • 1
    See also: [Correct usage of `strtol()`](http://stackoverflow.com/questions/14176123/correct-usage-of-strtol/) amongst others. – Jonathan Leffler Apr 04 '14 at 14:50
  • To be clear, is part of your goal to know which base (8,10,16) the original was in and report that with "/*octal*/", "" or "/*hexadecimal*/"? Or is this simple a text to `int` conversion? – chux - Reinstate Monica Apr 04 '14 at 15:50
  • Possible duplicate of [Converting string to integer C](https://stackoverflow.com/questions/7021725/converting-string-to-integer-c) – Ciro Santilli OurBigBook.com Dec 20 '17 at 12:57

2 Answers2

39

which function would be best to convert either a decimal, hexadecimal, or octal number to an int the best (?)

To convert such text to int, recommend long strtol(const char *nptr, char **endptr, int base); with additional tests when converting to int, if needed.

Use 0 as the base to assess early characters in steering conversion as base 10, 16 or 8.
@Mike Holt

Convert text per:
Step 1: Optional whitespaces like `' '`, tab, `'\n'`, ... .
Step 2: Optional sign: `'-'` or `'+'`.
Step 3:
  0x or 0X followed by hex digits--> hexadecimal  
  0 --> octal  
  else --> decimal  

Sample code

#include <errno.h>
#include <limits.h>
#include <stdlib.h>

int mystrtoi(const char *str) {
  char *endptr;
  errno = 0;
  //                                   v--- determine conversion base
  long long_var = strtol(str, &endptr, 0);
  //   out of range   , extra junk at end,  no conversion at all   
  if (errno == ERANGE || *endptr != '\0' || str == endptr) {
    Handle_Error();
  }

  // Needed when `int` and `long` have different ranges
  #if LONG_MIN < INT_MIN || LONG_MAX > INT_MAX
  if (long_var < INT_MIN || long_var > INT_MAX) {
    errno = ERANGE;
    Handle_Error();
  }
  #endif

  return (int) long_var;
}

atoi vs atol vs strtol vs strtoul vs sscanf ... to int

atoi()
Pro: Very simple.
Pro: Convert to an int.
Pro: In the C standard library.
Pro: Fast.
Con: On out of range errors, undefined behavior. @chqrlie
Con: Handle neither hexadecimal nor octal.

atol()
Pro: Simple.
Pro: In the C standard library.
Pro: Fast.
Con: Converts to an long, not int which may differ in size.
Con: On out of range errors, undefined behavior.
Con: Handle neither hexadecimal nor octal.

strtol()
Pro: Simple.
Pro: In the C standard library.
Pro: Good error handling.
Pro: Fast.
Pro: Can handle binary. (base 2 to base 36)
Con: Convert to an long, not int which may differ in size.

strtoul()
Pro: Simple.
Pro: In the C standard library.
Pro: Good error handling.
Pro: Fast.
Pro: Can handle binary.
---: Does not complain about negative numbers.
Con: Converts to an unsigned long, not int which may differ in size.

sscanf(..., "%i", ...)
Pro: In the C standard library.
Pro: Converts to int.
---: Middle-of-the-road complexity.
Con: Potentially slow.
Con: OK error handling (overflow is not defined).

All suffer/benefit from locale settings. §7.22.1.4 6 “In other than the "C" locale, additional locale-specific subject sequence forms may be accepted.”


Additional credits:
@Jonathan Leffler: errno test against ERANGE, atoi() decimal-only, discussion about errno multi-thread concern.
@Marian Speed issue.
@Kevin Library inclusiveness.


For converting short, signed char, etc., consider strto_subrange().

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • `stroul` typo for `strtoul`. What do you mean 'not re-entrant' (or, more accurately, why isn't it re-entrant)? – Jonathan Leffler Apr 04 '14 at 14:55
  • @Jonathan Leffler "not re-entrant" is confusing. My concern is `errno` needs to be clared, `strtox()` called and `errno` tested, but `errno` may change due to other concurrent processes. – chux - Reinstate Monica Apr 04 '14 at 15:05
  • 1
    Major point against `sscanf` is that it is horribly slow compared to others. – Marian Apr 04 '14 at 15:06
  • @Marian Concerning speed: Some compilers will analyze `scanf()` formats and adjust the `scanf()` code. Seen this in embedded designs, especially to determine if the floating -point package is needed. If all `scanf()` formats are known to call a limited repertoire of formats, optimizations occur that greatly improve speed and reduce code footprint. – chux - Reinstate Monica Apr 04 '14 at 15:56
  • 5
    You say '`errno` may change due to other concurrent processes'. Assuming that s/processes/threads/ matches what you meant, it remains true that in a threaded environment, `errno` is thread-specific and hence there isn't (shouldn't be) a problem. ISO/IEC 9899:2011 Section 7.5 Errors `` says: _`errno` which expands to a modifiable lvalue that has type `int` and thread local storage duration, the value of which is set to a positive error number by several library functions._ Footnote 201 explains that `errno` need not be the identifier of an object (e.g. it could be `*errno()`). – Jonathan Leffler Apr 04 '14 at 16:22
  • @Jonathan Leffler Good to know that `errno` can be expected to be thread specific - I had fuzzy info on that. BTW: all these methods may be affected by `locale`. “In other than the "C" locale, additional locale-specific subject sequence forms may be accepted.” Would you think `locale` is thread safe? – chux - Reinstate Monica Apr 04 '14 at 17:06
  • Questions about locale definitely complicate matters; I would have to research that, and it likely depends on the implementation. If you go mucking around with multiple locales and multiple threads, you definitely have a lot of worrying to do. (ISO/IEC 9899:2011 Section 7.11.1.1 **The `setlocale` function** says: _A call to the `setlocale` function may introduce a data race with other calls to the `setlocale` function or with calls to functions that are affected by the current locale._ But that's probably only the beginning of the issues.) – Jonathan Leffler Apr 04 '14 at 17:33
  • I'm confused by the strong `atoi` vs. `strtol` disambiguation. I took a look in my stdlib.h and `atoi` is implemented simply as `return (int) strtol (__nptr, (char **) NULL, 10);` (gcc 5.4.0). What is the difference, really? – Artur Czajka Mar 07 '17 at 09:17
  • 2
    @ArturCzajka 1) With `(int) strtol (__nptr, (char **) NULL, 10);` loses information should the conversion make a `long` outside the range of `int` 2) The `(char **) NULL` loses information about where the conversion stopped. 3) On error, the implementation does not need to follow `(int) strtol (__nptr, (char **) NULL, 10);` even if it does so on the compiler you used today, On error, the behavior is undefined. `strtol()` does not have these short-comings. – chux - Reinstate Monica Mar 07 '17 at 14:47
  • 1
    Instead of a laconic phrase *Con: No error handling*, you might want to underscore the problem with `atoi` and `atol`, ie: quoting the C Standard: *If the value of the result cannot be represented, the behavior is undefined.* – chqrlie Feb 22 '22 at 23:11
  • 1
    @chqrlie OK. Post edited. – chux - Reinstate Monica Feb 23 '22 at 00:01
16

It is only sensible to consider strtol() and strtoul() (or strtoll() or strtoull() from <stdlib.h>, or perhaps strtoimax() or strtoumax() from <inttypes.h>) if you care about error conditions. If you don't care about error conditions on overflow, any of them could be used. Neither atoi() nor atol() nor sscanf() gives you control if the values overflow. Additionally, neither atoi() nor atol() provides support for hex or octal inputs (so in fact you can't use those to meet your requirements).

Note that calling the strtoX() functions is not entirely trivial. You have to set errno to 0 before calling them, and pass a pointer to get the end location, and analyze carefully to know what happened. Remember, all possible return values from these functions are valid outputs, but some of them may also indicate invalid inputs — and errno and the end pointer help you distinguish between them all.

If you need to convert to int after reading the value using, say, strtoll(), you can check the range of the returned value (stored in a long long) against the range defined in <limits.h> for int: INT_MIN and INT_MAX.

For full details, see my answer at: Correct usage of strtol().

Note that none of these functions tells you which conversion was used. You'll need to analyze the string yourself. Quirky note: did you know that there is no decimal 0 in C source; when you write 0, you are writing an octal constant (because its first digit is a 0). There are no practical consequences to this piece of trivia.

Community
  • 1
  • 1
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 1
    When the value entered is larger (or smaller) than will fit into the integer type. The returned value is clamped to the end of the range supported by the type, but `errno == ERANGE` indicates that the overflow occurred. If you've got 32-bit `long` values, this could happen on a value 5,000,000,000 (minus the commas), even though all the digits were used. – Jonathan Leffler Apr 04 '14 at 14:53
  • Clamping to the "end of the range supported by the type" has a not discussed nuance. With `strtoul()` a leading `-` is applied after the conversion. `"-1"` becomes `ULONG_MAX`, not clamped to 0. – chux - Reinstate Monica Mar 29 '20 at 20:25
  • @chux-ReinstateMonica: Yes, but when is that an issue? Suppose `sizeof(unsigned int) != sizeof(unsigned long)` and you want an `unsigned int` value. You use `strtoul()` to process the string `"-1"`. As you say, `strtoul()` returns `ULONG_MAX`, which is compared with `UINT_MAX`, found to be bigger, and clamping returns `UINT_MAX` — which is what would be expected, is it not? So that's not a problem. If `sizeof(unsigned int) == sizeof(unsigned long)`, there's not a problem. I'm not clear when it would cause an issue. You shouldn't use an unsigned converter for a signed value or vice versa. – Jonathan Leffler Mar 29 '20 at 22:51
  • `"-1"` taking on the value of `UINT_MAX` can be seen as a contradiction to your [returned value is clamped to the end of the range supported by the type](https://stackoverflow.com/questions/22865622/atoi-vs-atol-vs-strtol-vs-strtoul-vs-sscanf/22865995?noredirect=1#comment34886258_22865995) as it is not 0. The point being with `strtol()` the `-` is considered before clamping, with `strtoul()` it is afterwords and so does not set `errno`. – chux - Reinstate Monica Mar 29 '20 at 22:56
  • @chux-ReinstateMonica: I think there's a simple alternative — and better — argument, using a negative number other than `-1`, such as `-2` where `sizeof(unsigned int) < sizeof(unsigned long)`. A true `strtoui()` would return `UINT_MAX - 1`, I think, but clamping returns `UINT_MAX`. I need to think (hard) about this. Maybe using `strtoul()` to implement `stroui()` with different sizes requires `strtoui()` to skip white space, track a negative sign, check for a digit after the sign, pass the string to `strtoul()` without the sign, monitor the range of the return, apply the sign itself. – Jonathan Leffler Mar 29 '20 at 23:13
  • I dabbled into `strtoul()` clamping [here](https://stackoverflow.com/q/60955490/2410359). – chux - Reinstate Monica Mar 31 '20 at 21:38