Type aware string to number conversion in C++

Question

Let's assume all the numbers in this world are positive integers and they can be represented by uintX_t C++ types.

Let's consider next awesome code to convert a std::string to a number:

#include <string>
#include <cstdint>
#include <iostream>

template <typename T>
T MyAwsomeConversionFunction(const std::string& value)
{
    T result = 0;
    for(auto it = value.begin(); it != value.end() && std::isdigit(*it); ++it)
    {
        result = result * 10 + *it - '0';
    }

    return result;
}

int main(int argc, const char * argv[])
{
    std::cout<<MyAwsomeConversionFunction<uint16_t>("1234")<<std::endl;
    std::cout<<MyAwsomeConversionFunction<uint16_t>("123456")<<std::endl;

    return 0;
}

As you can see there are multiple errors in this function, but I'm interested in a particular one: How to detect when my type is not large enough to contain the value (second conversion call as example) and avoid UB when making result = result * 10 + *it - '0';. I would like to know if that operation will exceed maximum value of T before making it. Is this possible?

EDIT: please check Is signed integer overflow still undefined behavior in C++? for more info about UB on arithmetic operations in C++. I want to avoid executing the line result = result * 10 + *it - '0'; when the result will overflow. In the answer the line is still executed...

EDIT2: I found the answer here: How to detect integer overflow?

EDIT3: The accepted answer applies for signed types. For unsigned types Cheers and hth. - Alf answer is correct.

Note that the standard library has [even more awesome code](http://en.cppreference.com/w/cpp/string/basic_string/stol) that will detect overflow and throw an exception. — Fred Larson, Jul 01 '14 at 14:37

score 1 · Accepted Answer · answered Jul 01 '14 at 15:55

1

I'll take a whack at this, although I may get picked apart for mistakes. This does not deal with negative values in the string (your original code doesn't either). And it's limited to ASCII digits, as Alf mentioned in a comment on his answer.

template <typename T>
T MyAwsomeConversionFunction(const std::string& value)
{
    T maxBeforeMult = std::numeric_limits<T>::max / 10;
    T result = 0;
    for(auto it = value.begin(); it != value.end() && std::isdigit(*it); ++it)
    {
        // Check if multiplying would overflow
        if (result > maxBeforeMult)
        {
            // throw overflow
        }

        result = result * 10;
        T digit = *it - 0;

        // Check if adding would overflow
        if (std::numeric_limits<T>::max - result < digit)
        {
            // throw overflow
        }

        result += digit;
    }

    return result;
}

answered Jul 01 '14 at 15:55

Fred Larson

60,987
18
112
174

This is needlessly inefficient and needlessly verbose and complex, but I understand how the OP, acting on a false belief about UB, voted it as a "solution". Such is SO. Readers should be aware of this, so that they don't choose this approach. – Cheers and hth. - Alf Jul 01 '14 at 15:57
@Cheersandhth.-Alf: Feel free to suggest (or even edit in) improvements. And I'm not sure what you mean by "false belief about UB", since signed overflow appears to be exactly his concern (although that was not made clear in the original question). – Fred Larson Jul 01 '14 at 15:59
the OP has asserted repeatedly, including in the question itself, that the types will only be unsigned. an efficent and simple solution for signed is to do the conversion as unsigned and then check signed limits. general solution is to use the standard library. – Cheers and hth. - Alf Jul 01 '14 at 16:00
Quote from the C++ standard, section 5, Expressions: "If during the evaluation of an expression, the result is not mathematically deﬁned or not in the range of representable values for its type, the behavior is undeﬁned." – Mircea Ispas Jul 01 '14 at 16:03
@Cheersandhth.-Alf: I can see nowhere where such an assertion has been made. You seem to have inferred that from the `main()`, but further edits and comments refer to signed overflow (such as the link the the question about signed overflow). – Fred Larson Jul 01 '14 at 16:06
@Felics: check out C++11 footnote 46 in §3.9.1/4 " unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type". That contradicts your interpretation of what you quoted. As a language lawyer I would argue that my own evidence is not valid because by the rules of ISO notes are non-normative. Still. I chose that because it fits nicely the comment size and the level of this discussion. – Cheers and hth. - Alf Jul 01 '14 at 16:07
@FredLarson: "all the numbers in this world are positive integers and they can be represented by uintX_t C++ types" from the start of the question. and so on. apparently you failed to notice numerous such assertions, *and* read statements about signed arithmetic that are nowhere to be found. – Cheers and hth. - Alf Jul 01 '14 at 16:09
In summary, this answer is needlessly inefficient and needlessly verbose and complex, and readers will do well to not adopt such solution. – Cheers and hth. - Alf Jul 01 '14 at 16:15
@Cheersandhth.-Alf http://stackoverflow.com/questions/16188263/is-signed-integer-overflow-still-undefined-behavior-in-c: "Just because a type is defined to use 2s complement representation, it doesn't follow that arithmetic overflow in that type becomes defined. The undefined behaviour of signed arithmetic overflow is used to enable optimisations; for example, the compiler can assume that if a > b then a + 1 > b also; this doesn't hold in unsigned arithmetic where the second check would need to be carried out because of the possibility that a + 1 might wrap around to 0..." – Mircea Ispas Jul 01 '14 at 16:15
@Felics: unsigned types do not use two's complement representation. two's complement is a representation of signed values. – Cheers and hth. - Alf Jul 01 '14 at 16:16
@Felics: Do you understand that there is a difference in the standard between overflow of signed types and overflow of unsigned types? Are you in fact excluding signed types in your question? – Fred Larson Jul 01 '14 at 16:16
Yes, the quote from the SO answer was for signed types. My bad. The quote from the standard was for any type. – Mircea Ispas Jul 01 '14 at 16:17
@Felics: do you understand now *why* your standard's quote does not imply UB for unsigned arithmetic? – Cheers and hth. - Alf Jul 01 '14 at 16:23
@Cheersandhth.-Alf The quote from the standard doesn't not say anything about the type. This is how I interpret the quote: I have uint16_t value = 65535; value = value + 1 -> the result is not representable using uint16_t, UB is triggered. I might be wrong - I will try to find an example but now I have to go home... I'll post the sample here if I can find one. – Mircea Ispas Jul 01 '14 at 16:45
@Felics: okay. with "uint16_t value = 65535; value = value + 1" the addition produces 65535 + 1 modulo 2^16, not because of an overflow and wrapping but because that's how unsigned addition is defined by the C++ standard (it is of course usually implemented as hardware level wrapping, and that is equally of course the rationale for having the definition of unsigned arithmetic). So, the +1 result is 0, and entirely within the range of the type. That's what C++11 note 46 tries to explain. – Cheers and hth. - Alf Jul 01 '14 at 17:46

score 1 · Answer 2 · answered Jul 01 '14 at 15:55

You just need to work backwards, asking if a given digit will overflow:

// When result exceeds this thresh, appending a digit will always overflow.
static const T thresh = std::numeric_limits<T>::max() / 10;
// When result equals this thresh, appending a digit larger than
// thresh_last_digit will overflow.
static const T thresh_last_digit = std::numeric_limits<T>::max() - 10 * thresh;

for(auto it = value.begin(); it != value.end() && std::isdigit(*it); ++it)
{
    if(result > threshold)
        throw std::overflow_error(value);
    T digit = *it - '0';
    if(result == threshold && digit > thresh_last_digit)
        throw std::overflow_error(value);
    result = result * 10 + digit;
}

score 0 · Answer 3 · answered Jul 01 '14 at 14:41

0

For unsigned type T you can always do

T const original = result;
result = result * 10 + *it - '0';
if( result / 10 != original ) { throw 666; }

Except, replace the throw 666 with something.

For the apparent original problem of converting string → integer with overflow detection, see strtol and family.

answered Jul 01 '14 at 14:41

Cheers and hth. - Alf

142,714
15
209
331

2

Won't this be UB if `T` is a signed type? – Fred Larson Jul 01 '14 at 14:46
@FredLarson: yes, but then that's not asked about. – Cheers and hth. - Alf Jul 01 '14 at 14:46
The example uses only unsigned types, but it seems like it would be a nasty gotcha to leave in a function template. – Fred Larson Jul 01 '14 at 14:48
1

@FredLarson: not only that, but if the template is instantiated with floating point type the `/` will not do the assumed integer division, and if it's instantiated with some user-defined type with custom `/`, or for that matter custom `+` or `-` or `*`, it could do *anything*. Yikes! – Cheers and hth. - Alf Jul 01 '14 at 14:51
1

Could be, but I don't think the OP thought about such issues. I wasn't entirely serious. But one real problem with the code given is the call `std::isdigit(*it)`. Works fine for ASCII (7-bit) input, but more generally it's Undefined Behavior. Such are the issues of subtlety. – Cheers and hth. - Alf Jul 01 '14 at 14:56
result = result * 10 + *it - '0'; will trigger UB because of overflow. I need to avoid running this code when overflow will happen. Please check http://stackoverflow.com/questions/16188263/is-signed-integer-overflow-still-undefined-behavior-in-c for more info – Mircea Ispas Jul 01 '14 at 15:27
@Cheersandhth.-Alf The template will not be instantiated with floating point types because they don't exist in our awesome world. All the numbers are positive integers. – Mircea Ispas Jul 01 '14 at 15:32
@Felics: re "result = result * 10 + *it - '0'; will trigger UB because of overflow.", no that's not the case for unsigned type. – Cheers and hth. - Alf Jul 01 '14 at 15:34
@Felics: I see no need to check any link you provide, because I'm not going to answer you more extensively (correcting misconceptions) than I already have. Well, except for the idea of SO as an authority. It's not, it's Herb Schildt-land, the largest on the net. – Cheers and hth. - Alf Jul 01 '14 at 15:35
Isn't it more efficient to check for overflow by `result * 10 < result` ? – Slava Jul 01 '14 at 15:37
@Slava result * 10 might trigger UB when result * 10 > max value of type. – Mircea Ispas Jul 01 '14 at 15:37
@slava: yes, but not correct. ;-) consider the value 2^(N-1)+1, where N is the number of bits. 10 times that gives you what, modulo 2*N? – Cheers and hth. - Alf Jul 01 '14 at 15:38
I meant, "modulo 2^N" – Cheers and hth. - Alf Jul 01 '14 at 15:44
@Cheersandhth.-Alf I'm am the downvoter. The UB is still present here. The overflow "check" is done AFTER executing the statement that may trigger it. The computer my catch fire before executing the if. – Mircea Ispas Jul 01 '14 at 15:47
1

@Felics: there is no UB. unsigned arithmetic is performed modulo 2^N where N is the number of bits in the value representation, and it's entirely well-defined except for division by 0. people who downvote anonymously based on their own ignorance and incompetence should imho be flogged. – Cheers and hth. - Alf Jul 01 '14 at 15:50
1

@Cheersandhth.-Alf After reading in detail CERT document about integer overflows I found out I was completely wrong when it comes to unsigned integers - their arithmetic is well defined as modulo 2^number of bits. The UB is triggered only for unsigned integer. Unfortunately I can't up vote your answer without it being edited. I also think the standard should contain a note on the section 5, Expressions saying that for unsigned types that doesn't apply. – Mircea Ispas Jul 01 '14 at 20:42

Type aware string to number conversion in C++

3 Answers3