3

I just discovered (much to my surprise) that the following inputs do not cause std::stoi to throw an exception:

3.14
3.14helloworld

Violating the principle of least surprise - since none of these are valid format integer values.

Note, perhaps even more surprisingly 3.8 is converted to the value 3.

Is there a more stringent version of std::stoi which will throw when an input really is not a valid integer? Or do I have to roll my own?

As an asside, why does the C++ standard library implement std::stoi this way? The only practical use this function has is to desperately try and obtain some integer value from random garbage input - which doesn't seem like a very useful function.

This was my workaround.

static int convertToInt(const std::string& value)
{
    std::size_t index;
    int converted_value{std::stoi(value, &index)};

    if(index != value.size())
    {
        throw std::runtime_error("Bad format input");
    }

    return converted_value;
}
FreelanceConsultant
  • 13,167
  • 27
  • 115
  • 225
  • 10
    `stoi` and friends behave just like taking input does. If you have `int foo; cin >> foo;` and type in `3.14` for the input you get `3` in `foo` just like `stoi`. You can use `stoi`'s second parameter to confirm if all of the input was converted or not. – NathanOliver May 12 '22 at 21:31
  • 4
    Anyway, did you consider why the function has a `size_t*` parameter after the string that it processes? Did you try checking the value of the pointed-at `size_t` afterwards? Do you see how that relates to the conversion that was performed? If not, did you try checking the documentation in order to understand it? – Karl Knechtel May 12 '22 at 21:33
  • 3
    It is kinda weird that stream-processing assumptions ("The value to be read is the first thing in the stream, and leftover data is left in the stream to be handled later") get applied when there is no stream in the picture. There is value in having parsing functions behave consistently, and value in having them to behave optimally for the particular task, and in this area consistency won. – Ben Voigt May 12 '22 at 21:42
  • 1
    @FreelanceConsultant: Can we agree that the two-argument function call is useful for building stream processors, because it returns the point between data consumed and data still to be processed? So then the one-argument call should either behave like the two-argument call, or else it should have a different name.... – Ben Voigt May 12 '22 at 21:46
  • 1
    Note that your posted "workaround" is inconsistent, because your program is accepting leading [whitespace](https://en.wikipedia.org/wiki/Whitespace_character) (which is ignored by `std::stoi`), but throwing an exception on trailing whitespace. Therefore, you may want to test the trailing input with [`std::isspace`](https://en.cppreference.com/w/cpp/string/byte/isspace) before throwing an exception. See [this answer of mine to another question](https://stackoverflow.com/a/70818264/12149471) for an example of how this can be done. – Andreas Wenzel May 12 '22 at 21:48
  • @NathanOliver: The pre-existing non-`std::string` version (`strtol`) is made so that you can build a stream processor with it, as it gives back the new stream pointer where the next operation should take place. The `std::string` function is then a bit awkward, because `std::string` has no corresponding operation like "advance the pointer by N characters", a substring is a separate instance entirely. Presumably with `string_view` it's back to being a cheap operation to continue more parsing operations where the first one ended. – Ben Voigt May 12 '22 at 21:50
  • 2
    @AndreasWenzel "*Therefore, you may want to test the trailing input with `std::isspace` before throwing an exception*" - even that is likely not stringent enough for most people, for example in `"3 hello"` the reported position would be whitespace, just not trailing whitespace. What I would do instead is start at the reported position and use `string::find_first_not_of()` or `std::find_if()` to check if there is any unparsed non-whitespace characters left in the input string, and then throw if any are found. – Remy Lebeau May 12 '22 at 22:13
  • @RemyLebeau: Yes, I meant that `std::isspace` should be used in a loop on all remaining trailing characters. It should not just test the first trailing character. That is how I did it in [this answer of mine to another question](https://stackoverflow.com/a/70818264/12149471). Of course, if you use `std::string::find_first_not_of` instead of `std::isspace`, then you won't be needing a loop, but this would require hard-coding the whitespace characters (which may or may not be desirable). – Andreas Wenzel May 12 '22 at 22:18

2 Answers2

6

The answer to your question:

Is there a more stringent version of std::stoi?

is: No, not in the standard library.

std::stoi, as described here behaves like explained in CPP reference:

Discards any whitespace characters (as identified by calling std::isspace) until the first non-whitespace character is found, then takes as many characters as possible to form a valid base-n (where n=base) integer number representation and converts them to an integer value. The valid integer value consists of the following parts: . . . . .

And if you want a maybe more robust version of std::stoi which fits your special needs, you do really need to write your own function.

There are that many potential implementations that there is not the ONE "correct" solution. It depends on your needs and programming style.

I just show you (one of many possible) example solution:

#include <iostream>
#include <string>
#include <utility>
#include <regex>

// Some example. Many many other different soultions possible
std::pair<int, bool> stoiSpecial(const std::string s) {

    int result{};
    bool validArgument{};

    if (std::regex_match(s, std::regex("[+-]?[0-9]+"))) {
        try {
            result = stoi(s);
            validArgument = true;
        }
        catch (...) {};
    }
    return {result, validArgument };
}

// Some test code
int main() {
    
    std::string valueAsString{};
    std::getline(std::cin,valueAsString);

    if (const auto& [result, validArgument] = stoiSpecial(valueAsString); validArgument)
        std::cout << result << '\n';
    else
        std::cerr << "\n\n*** Error: Invalid Argument\n\n";
}
A M
  • 14,694
  • 5
  • 19
  • 44
  • Nice use of `regex_match` that actually provides a simpler way to solve my problem. It will also work for strings and floating point format, which I also have although I didn't mention this in the main question. On the point of `regex_match` - how can I match printable strings? Will `"[\s-~]+"` work? I am assuming `\s` means a space character. I haven't actually checked how to match a space character - yet. I assume there is a way to match spaces which *does not* match things like tabs and new lines. – FreelanceConsultant May 13 '22 at 10:12
  • In other words match all ascii characters from the value of [space] to `~` [tilde]. – FreelanceConsultant May 13 '22 at 10:13
4

Is there a more stringent version of std::stoi which will throw when an input really is not a valid integer? Or do I have to roll my own?

You will have to roll your own, because your demands clash with the one, consistent, unsurprising way in which all "string to integer" functionality in both C and C++ is defined.

First off, you'd have to come up with your definition of "a valid integer". Do you accept leading 0 (octal), leading 0x (hexadecimal), and / or leading 0b (binary)? Do you accept leading whitespace?

If you're OK with both, your workaround is good enough. Otherwise, you'd have to check the first character of your string to be isdigit as well as being non-null.


I just discovered (much to my surprise) that the following inputs do not cause std::stoi to throw an exception:

Reading a good reference on any function you are not very familiar with before using it is a rather basic requirement.

That reference states very clearly that, after skipping any leading whitespace, it will take "as many characters as possible" to form "a valid [...] integer number representation", and that the second argument "will receive the address of the first unconverted character".

Violating the principle of least surprise - since none of these are valid format integer values.

Note, perhaps even more surprisingly 3.8 is converted to the value 3.

Is there a more stringent version of std::stoi which will throw when an input really is not a valid integer? Or do I have to roll my own?

There is one significant problem here: You have made assumptions, haven't bothered to check them with a reference, and are now digging in your heels that you know better. Not only is the behavior you observed internally consistent with all of C++'s istream operator>>, std::sto* family, and C's *scanf, strto*, and ato* family. It is also how Java's Scanner.nextInt(), C#'s int.TryParse, Perl's int, and similar functions from a dozen other languages work.

(By the way, this is also true for the various floating-point parsing functions as well.)


Why is std::stoi implemented this way?

Because this is the most efficient implementation for the general use-case.

The only practical use this function has is to desperately try and obtain some integer value from random garbage input - which doesn't seem like a very useful function.

Consider:

4;3.14;16

That is clearly not "random garbage input", but semicolon-separated data -- something encountered quite often, you will agree.

If "reading an int" would throw an exception at a non-digit input, like you suggest, we would be looking at a minimum of two exceptions being thrown for parsing this very non-exceptional line of input. Alternatively, we would have to pass over that input twice, first for finding the semicolons / line ends (and either having to write into the input string or setting up several temporary variables), then a second time for parsing. That would be very inefficient.

DevSolar
  • 67,862
  • 21
  • 134
  • 209