6

I'm trying to write a helper function that can be used for parsing integers from config files and from a text-based protocol (written by machine, not by a human). I've read How to parse a string to an int in C++? but the solutions there don't address all the issues. I would like something that will (from most to least important):

  1. Reject out-of-range values. strtoul and strtoull don't quite achieve this: given a leading minus sign, the value is negated "in the return type". So "-5" is happily parsed and returns 4294967291 or 18446744073709551611 instead of signalling an error.
  2. Be in the C locale, regardless of the global locale setting (or even better, give me a choice). Unless there is a way to set the global locale on a per-thread basis, that rules out strtoul, stoul and boost::lexical_cast, and leaves only istringstream (where one can imbue a locale).
  3. Be reasonably strict. It definitely must not accept trailing garbage, and ideally I'd like to ban white space as well. That immediately makes strtol and anything based on it a little problematic. It seems that istringstream can work here using noskipws and checking for EOF, although that might just be a GCC bug.
  4. Ideally give some control whether the base should be assumed to be 10 or should be inferred from a 0 or 0x prefix.

Any ideas on a solution? Is there an easy way to wrap the existing parsing machinery to meet these requirements, or is it going to end up being less work to write the parser myself?

Community
  • 1
  • 1
Bruce Merry
  • 751
  • 3
  • 11
  • If its written by a machine why are the values out of range? – andre Oct 28 '13 at 18:42
  • 1
    sounds like you need to develop you own. or find a custom library – BЈовић Oct 28 '13 at 18:44
  • 1
    @andre Range validation is more for the config file parsing (that is written by a human). But it's also prudent to validate any data received over a network. – Bruce Merry Oct 28 '13 at 19:15
  • 1
    Trust and responsibility. As soon as the user enters invalid data they should be informed making it to the parser is very bad. Also for the network, the network layer of your application is responsible for getting your data safe, if you don't trust it then who is to say that out of range data will be your only issue. – andre Oct 28 '13 at 19:58
  • Maybe run the string through a regex to "pre-parse" and validate the string itself before converting to an integer would work. – Dan Oct 28 '13 at 21:01
  • http://qt-project.org/doc/qt-5.1/qtcore/qbytearray.html#toInt and friends meet most of these I think, but not exactly standalone... ) – Frank Osterfeld Oct 28 '13 at 21:11

2 Answers2

1

You basically want the num_get<char> facet of the C locale. It's somewhat complicated, so see this example. Basically, you have to call use_facet<num_get<char,string::iterator> > (locale::classic).get(begin, end, ... , outputValue).

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • This is basically the same as using an istringstream, since that's what operator>> uses under the hood. There might be some reductions in overhead, but it still accepts negative values and wraps them. – Bruce Merry Oct 29 '13 at 06:00
  • @BruceMerry: It's indeed cutting out the overhead; it's not like there are many distinct implementations under the hood. As for negative numbers, check if it roundtrips: can you get the original string back? – MSalters Oct 29 '13 at 08:20
1

There are some quick hacks, parse as normal (non robust) and do some small checks in the input (for example if parsing an non-negative number check that it doesn't have '-' character).

The ultimate test of robustness is to convert the integer back to text, and check that the input text and the output text is the same. When working in the text version, then you can relax things, like accepting leading 0's or spaces.

alfC
  • 14,261
  • 4
  • 67
  • 118