10

Code:

#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
using std::cerr;
using std::cout;
using std::stringstream;
using std::string;
using std::for_each;

void convert(const string& a_value)
{
    unsigned short i;
    if (stringstream(a_value) >> i)
        cout << a_value << " converted to " << i << ".\n";
    else
        cerr << a_value << " failed to convert.\n";
}

int main()
{
    string inputs[] = { "abc", "10", "999999999999999999999", "-10", "0" };
    for_each(inputs, inputs + (sizeof(inputs)/sizeof(inputs[0])), convert);
    return 0;
}

Output from Visual Studio Compiler (v7, v8, v9, v10):

abc failed to convert.
10 converted to 10.
999999999999999999999 failed to convert.
-10 converted to 65526.
0 converted to 0.

Output from g++ (v4.1.2, v4.3.4):

abc failed to convert.
10 converted to 10.
999999999999999999999 failed to convert.
-10 failed to convert.
0 converted to 0.

I expected the "-10" to fail to be converted to an unsigned short but it succeeds with the VC compilers. Is this a:

  • bug in the VC compilers ?
  • bug in the GNU compilers and I have an incorrect expectation ?
  • an implementation defined behaviour ?
hmjd
  • 120,187
  • 20
  • 207
  • 252
  • 1
    It wraps. Your unsigned short is 16 bit. http://stackoverflow.com/questions/2760502/question-about-c-behaviour-for-unsigned-integer-underflow – Brian Roach Oct 24 '12 at 08:47
  • 1
    Garbage in / garbage out -- if you care about string validity, then test it before converting. – paddy Oct 24 '12 at 08:48
  • @paddy It is defined behavior. Found a better SO question that has more info than first dup http://stackoverflow.com/questions/50605/signed-to-unsigned-conversion-in-c-is-it-always-safe – Brian Roach Oct 24 '12 at 08:53
  • @BrianRoach, I can see that it wraps but I, possibly incorrectly, expected the stream `operator>>` to not do that and fail. – hmjd Oct 24 '12 at 08:55
  • @hmjd - see the two other SO questions I linked, it's a defined behavior in assignment, so I don't think it should be unexpected with the stream operator. – Brian Roach Oct 24 '12 at 08:56
  • 1
    @paddy, my expectation is garbage in nothing out and to be informed of the failure. The program tests other garbage inputs and they fail to be converted as expected. – hmjd Oct 24 '12 at 08:57
  • Hey Brian, I wasn't referring to your comment about wrapping. Of course it wrapped. Those links you gave have nothing to do with reading from streams. The OP wants to know whether reading a negative value into an unsigned from a stringstream has defined behaviour. I don't know, and cannot answer, but in general my philosophy is: test your inputs when it matters. It's very simple to do basic validation on the characters of an integer string before converting it. – paddy Oct 24 '12 at 09:13
  • 1
    @paddy The basic philosophy in this case is to use the `istream` to do the testing. He should probably declare the `std::istringstream` before the if, and use something like `if ( stream >> i >> std::ws && stream.get() == EOF )`, but that _should_ be sufficient checking. – James Kanze Oct 24 '12 at 09:24

2 Answers2

5

The answer depends on what version of C++ you are using. C++03 and earlier required the input to conform to what sscanf does (using here the "%hi" input specifier), and sscanf reads an integral value into a (signed) short, with no overflow detection; the results are then assigned (with implicit conversion) to your unsigned short. C++11 requires the equivalent of calling strtoull, which doesn't allow the - sign, and requires an error in case of overflow (which is undefined behavior in sscanf, and thus C++03).

In practice, all reasonable implementations of C++03 did check for overlow, and the "undefined behavior" in such cases corresponded to that which is now required. On the other hand, they were required to accept the minus sign, which is now (logically) forbidden.

EDIT (correction): On rereading the requirements of strtoull, I find that it does require accepting the minus sign. So as stupid as it seems, the standard does require input to an unsigned integral type to accept the minus sign. (Note too that the behavior of strtoull depends on the global C locale, which may accept additional posibilities.)

EDIT (further clarification): As ectamur points out, this should be an error (in C++11), because (unsigned long long)( -10 ) will be too large to be represented in an unsigned short. On the other hand, it is still undefined behavior in pre-C++03 (which is perhaps what VC++ is conforming to---so whatever they do is "correct").

James Kanze
  • 150,581
  • 18
  • 184
  • 329
3

g++ is correct. The arithmetic extractors for unsigned integer types are defined in 27.7.2.2.2p1 as depending on num_get<>; 22.4.2.1.2p3 states that:

Stage 3: The sequence of chars accumulated in stage 2 (the field) is converted to a numeric value by the rules of [...] — For an unsigned integer value, the function strtoull.

and that the number stored should be

— the most positive representable value, if the field represents a value too large positive to be represented in val. ios_base::failbit is assigned to err.

On the operation of strtoull, C++ defers to C, which is a little unclear on the result of trying to convert a negative-sign field with strtoull; it states that "the value resulting from the conversion is negated (in the return type)", which for unsigned long long would result in sign wrap (to ULONGLONG_MAX - 10 + 1).

So strtoull returns a value too large to be representable in unsigned short, and num_get is required to store USHORT_MAX and set the fail bit.

On the other hand, 22.4.2.1.2p3 also states that the number stored should be (my emphasis):

— the most negative representable value or zero for an unsigned integer type, if the field represents a value too large negative to be represented in val. ios_base::failbit is assigned to err.

The presence of this clause indicates that the rules of strtoull are not to be followed strictly for a field with negative sign; under this interpretation num_get is required to store 0 and set the fail bit.

In either case the conversion is required to fail.

ecatmur
  • 152,476
  • 27
  • 293
  • 366
  • Good point about the overflow. (I rather suspect that there is still some unintentional behavior in the C++ standard here. Why should `"-10"` be acceptable when inputting to a `unsigned long long`, but not to other unsigned types?) – James Kanze Oct 24 '12 at 09:21
  • 1
    For reference of future visitors, this is open issue [LWG 1169](http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#1169) – Cubbi Jan 17 '15 at 17:27