Should reading negative into unsigned fail via std::cin (gcc, clang disagree)?

Question

For example,

#include <iostream>

int main() {
  unsigned n{};
  std::cin >> n;
  std::cout << n << ' ' << (bool)std::cin << std::endl;
}

When input -1, clang 6.0.0 outputs 0 0 while gcc 7.2.0 outputs 4294967295 1. I'm wondering who is correct. Or maybe both are correct for the standard does not specify this? By fail, I take to mean (bool)std::cin be evaluated false. clang 6.0.0 fails input -0 too.

As of Clang 9.0.0 and GCC 9.2.0, both compilers, using either libstdc++ or libc++ in the case of Clang, agree on the result of the program above, independent of the C++ version (>= C++11) used, and print

4294967295 1

i.e. they set the value to ULLONG_MAX and do not set the failbit on the stream.

What does "fail" mean here? You can't get -1, that's for sure. — Arndt Jonasson, Apr 19 '18 at 12:39
Have been trying to answer this ... it's a rabbit hole. At a minimum could you add what C++ standard you are compiling against there are so many "up to", "after" etc changes that not knowing this is going to make it almost impossible to give a definitive answer. — Richard Critten, Apr 19 '18 at 12:42
@ArndtJonasson I would assume that 'fail' means that `failbit` of the input stream was set - and therefore the second output would be 0 rather than 1. — eerorika, Apr 19 '18 at 12:48
Additional references: [libc++ bug](https://bugs.llvm.org/show_bug.cgi?id=36914) resulting in the changed behavior and [LWG issue](http://cplusplus.github.io/LWG/lwg-defects.html#1169) resulting in C++17 change. From neither of those it is clear to me whether failbit should be set. — walnut, Oct 25 '19 at 01:06

score 26 · Accepted Answer · edited Jun 20 '20 at 09:12

26

I think that both are wrong in C++17¹ and that the expected output should be:

4294967295 0

While the returned value is correct for the latest versions of both compilers, I think that the ios_base::failbit should be set, but I also think there is a confusion about the notion of field to be converted in the standard which may account for the current behaviors.

The standard says — [facet.num.get.virtuals#3.3]:

The sequence of chars accumulated in stage 2 (the field) is converted to a numeric value by the rules of one of the functions declared in the header <cstdlib>:

For a signed integer value, the function strtoll.

For an unsigned integer value, the function strtoull.

For a floating-point value, the function strtold.

So we fall back to std::strtoull, which must return² ULLONG_MAX and not set errno in this case (which is what both compilers do).

But in the same block (emphasis is mine):

The numeric value to be stored can be one of:

zero, if the conversion function does not convert the entire field.

the most positive (or negative) representable value, if the field to be converted to a signed integer type represents a value too large positive (or negative) to be represented in val.

the most positive representable value, if the field to be converted to an unsigned integer type represents a value that cannot be represented in val.

the converted value, otherwise.

The resultant numeric value is stored in val. If the conversion function does not convert the entire field, or if the field represents a value outside the range of representable values, ios_base::failbit is assigned to err.

Notice that all these talks about the "field to be converted" and not the actual value returned by std::strtoull. The field here is actually the widened sequence of character '-', '1'.

Since the field represents a value (-1) that cannot be represented by an unsigned, the returned value should be UINT_MAX and the failbit should be set on std::cin.

_{¹clang was actually right prior to C++17 because the third bullet in the above quote was:}

_{- the most negative representable value or zero for an unsigned integer type, if the field represents a value too large negative to be represented in val. ios_base::failbit is assigned to err.}

_{² std::strtoull returns ULLONG_MAX because (thanks @NathanOliver) — C/7.22.1.4.5:}

_{If the subject sequence has the expected form and the value of base is zero, the sequence of characters starting with the first digit is interpreted as an integer constant according to the rules of 6.4.4.1.
[...]
If the subject sequence begins with a minus sign, the value resulting from the conversion is negated (in the return type).}

edited Jun 20 '20 at 09:12

Community

1
1

answered Apr 19 '18 at 12:44

Holt

36,600
7
92
139

I beleive what you are looking for is *the sequence of characters starting with the first digit is interpreted as an integer constant according to the rules of 6.4.4.1.* with *If the subject sequence begins with a minus sign, the value resulting from the conversion is negated (in the return type)* from 7.22.1.4.5 of the C standard. I think with that it would make this answer "standard complete" :) – NathanOliver Apr 19 '18 at 13:04
1

@NathanOliver I did add this but I am actually rewriting the answer because I found some other evidences in the standard - I am deleting it while I edit it. Thanks for the quote anyway! – Holt Apr 19 '18 at 13:05
@NathanOliver I have updated the answer, I would be happy to get your point of view on it. – Holt Apr 19 '18 at 13:13
3

I'm not sure where you got your second quote but it doesn't match what I have for C++17: https://timsong-cpp.github.io/cppwp/facet.num.get.virtuals#3.3. According to that it should be the most positive value. – NathanOliver Apr 19 '18 at 13:24
2

Ah, looked at my C++11 draft and that is the language in there. Looks like there was a change somewhere along the line. – NathanOliver Apr 19 '18 at 13:26
@NathanOliver N4296 (C++14 latest draft if I am not wrong), I'll update with your quote since this question is tagged C++17. I don't think this change my conclusion. – Holt Apr 19 '18 at 13:27
Aha, I was still thinking it should store the max in `n` but iirc if failbit is set then the value passed to `cin` gets set to 0. +1 – NathanOliver Apr 19 '18 at 13:32
@NathanOliver It failbit is set, shouldn't the value not be touched instead of being set to `0`? – Lingxi Apr 19 '18 at 13:40
@NathanOliver I think that you are actually right and that `n` should equals `UINT_MAX` but the failbit should be set, so both compiler would be wrong? – Holt Apr 19 '18 at 13:42
@Lingxi and holt, at least [this](https://stackoverflow.com/questions/32378911/why-does-cin-expecting-an-int-change-the-corresponding-int-variable-to-zero-in) says it should be set to 0. – NathanOliver Apr 19 '18 at 13:59
@NathanOliver I did not find reference in [this](https://timsong-cpp.github.io/cppwp/istream.formatted.arithmetic), and I just remembered why I never answered stream related questions in C++ ;) – Holt Apr 19 '18 at 14:01
No kidding. Streams are a PITA. – NathanOliver Apr 19 '18 at 14:04
@NathanOliver Actually, I think that cppreference (and the linked answer) are referencing the first bullet above *"zero, if the conversion function does not convert the entire field."* which happens if the input does not contain a valid number. – Holt Apr 19 '18 at 14:05
That makes a lot of sense. If that is the case then both are wrong as the output should be 4294967295 0: http://coliru.stacked-crooked.com/a/ea3475f2633adcd9. Too bad I can't +1 again – NathanOliver Apr 19 '18 at 14:09
At the end you're saying that the fail bit should be set, but at the beginning you say it shouldn't... – wizzwizz4 Apr 19 '18 at 17:18
@Holt Oh, never mind. I boolean notted what you said. – wizzwizz4 Apr 19 '18 at 17:37
Note that the [strtoul example at cppreference](http://en.cppreference.com/w/cpp/string/byte/strtoul#Example) converts -40 to unsigned without a problem, so why would -1 be an error? – Bo Persson Apr 20 '18 at 07:59
@BoPersson This is mentioned in my answer - The behavior of `strtoull` is expected, but the return value of `std::num_get::get` is not directly the converted value returned by `strtoull`, which is what I discuss in the end of my answer. – Holt Apr 20 '18 at 08:10
@BoPersson In particular, the quote *"the most positive representable value, if the field **to be converted** to an unsigned integer type represents a value that cannot be represented in val."* imply (from my point of view) a check on the actual sequence of characters, and not only on the return value of `strtoull`. – Holt Apr 20 '18 at 08:12
Another plausible reading is that the value represented by "the field to be converted" is the value determined from `strto*`'s rules, which would make GCC right. – T.C. Apr 24 '18 at 21:35
@T.C. I agree this is not 100% unambiguous but why use "the field to be converted" instead of "the converted value" in this case? Furthermore, even in this case gcc would likely be wrong since the converted function is strtoull which would return ULLONG_MAX that is likely not representable by an unsigned int. – Holt Apr 24 '18 at 22:41
1

@Holt This came up in another question and I noticed that recent versions of GCC and Clang are in agreement on this, but disagree with your answer. Could you revisit it? – walnut Oct 20 '19 at 20:26
@uneven_mark Can you point me to this other question? I did not notice changes in the standard that could affect this, so I will currently stand on my ground on this (even after re-reading this for the 10th time) unless proven otherwise or unless there is some "official" reasoning from gcc and clang. – Holt Oct 23 '19 at 07:51
@Holt This problem wasn't really discussed, but in [this question](https://stackoverflow.com/questions/58476766/c-negative-verification-is-not-working?noredirect=1&lq=1) the code would have worked as OP intended if the failbit was set for negative inputs, although OP of that question seemed to have had misunderstood `unsigned` in general. I was surprised of the behavior when I tested it myself and searched for a related question, ending up here. – walnut Oct 23 '19 at 09:16
Since this seems like a very basic question to ask when doing IO, I thought it was worthwhile to ask for clarification. Maybe someone else will provide reasoning from the standard libraries point of view for the behavior, especially libc++'s, which seems to have changed since this question was posted originally. – walnut Oct 23 '19 at 09:19
1

@Holt [Here](https://bugs.llvm.org/show_bug.cgi?id=36914#c3) is a comment in the bug report that caused the change in libc++'s behavior discussing whether the failbit should be set. I don't know whether this adds anything new to this thread. – walnut Oct 25 '19 at 01:21
@uneven_mark That's the current standard I think. If you read the standard for the C functions `strto*`, you'll notice how hard to understand they are (just check the quote [here](http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_006.html)). From my point of view, both gcc and clang follows the specification of the C functions for the returned value now (see https://github.com/llvm-mirror/libcxx/commit/f382e530159b42de9120ddecaf57d540471f5962 for clang). [...] – Holt Oct 25 '19 at 07:18
@uneven_mark [...] But as far as I understand, when checking if `err` needs to be set or not (in `num_get`, see, e.g., http://cplusplus.github.io/LWG/lwg-defects.html#1169), there is mention of *field to be converted*, which the **to be** leads me to think that extra checks need to performed by `num_get`, regardless of the `errno` state or the value returned by `strto*`. – Holt Oct 25 '19 at 07:21

darune · Answer 2 · 2019-10-22T13:30:28.580

2

The question is about differences between the library implementations libc++ and libstdc++ - and not so much about differences between the compilers(clang, gcc).

cppreference clears these inconsistencies up pretty well:

The result of converting a negative number string into an unsigned integer was specified to produce zero until c++17, although some implementations followed the protocol of std::strtoull which negates in the target type, giving ULLONG_MAX for "-1", and so produce the largest value of the target type instead. As of c++17, strictly following std::strtoull is the correct behavior.

This summarises to:

ULLONG_MAX (4294967295) is correct going forward, since c++17 (both compilers do it correct now)
Previously it should have been 0 with a strict reading of the standard (libc++)
Some implementations (notably libstdc++) followed std::strtoull protocol instead (which now is considered the correct behavior)

The failbit set and why it was set, might be a more interesting question (at least from the language-lawyer perspective). In libc++ (clang) version 7 it now does the same as libstdc++ - this seems to suggest that it was chosen to be same as going forward (even though this goes against the letter of standard, that it should be zero before c++17) - but so far I've been unable to find changelog or documentation for this change.

The interesting block of text reads (assuming pre-c++17):

If the conversion function results in a negative value too large to fit in the type of v, the most negative representable value is stored in v, or zero for unsigned integer types.

According to this, the value is specified to be 0. Additionally, no where is it indicated that this should result in setting the failbit.

edited Oct 22 '19 at 13:30

answered Oct 21 '19 at 20:34

darune

10,480
2
24
62

The top answer does say the same about the value set. However it additionally asserts that the failbit should be set in both cases, which is not what the current compilers do. The cppreference page is not explicitly mentioning whether it should be set. Is this supposed to mean that it should not be set in either case? Then the old Clang behavior would be wrong as well. I think this is what needs to be explained explicitly. – walnut Oct 21 '19 at 22:15
@uneven_mark i have updated my answer - it turns out clang did the same as gcc – darune Oct 22 '19 at 07:37
This is a library issue, so you really instead of gcc vs clang we should be talking about libstdc++ vs libc++. Godbolt uses by default libstdc++ for both. You need to specify `-stdlib=libc++`, then you will observe OP's results. I think there needs to be a more detailed explanation for why the failbit shouldn't be set, given that this is a language-lawyer question and the highly upvoted answer is contradicting for both C++17 and before. – walnut Oct 22 '19 at 08:27
@uneven_mark im pretty sure this is a bug in libc++ (don't you think ?) - language-lawyer or not – darune Oct 22 '19 at 11:38
I don't know whether it is. As you can see in the top answer there was discussion about what the standard actually means, as it says "*If the conversion function does not convert the entire field, or if the field represents a value outside the range of representable values, ios_base::failbit is assigned to err.*". Standard libraries seem to agree now that this does not mean that the failbit should be set in this situation, but again the top answer argues differently. I don't know what the correct answer is myself. – walnut Oct 22 '19 at 12:00
I might go looking for the relevant libc++ patch later. It might contain at least libc++ developer's reason for the change (bug fix?) – walnut Oct 22 '19 at 12:01
@uneven_mark where do we find those ? I could tell they 'fixed the bug' from 6000 to 7000 by testing done by me – darune Oct 22 '19 at 12:04
I would start by searching their bugzilla for a closed bug here: https://bugs.llvm.org/ If I found nothing I would try to figure out the relevant code in the library and git-bisect on the change. The commit message would then hopefully be helpful. This may be time consuming though. – walnut Oct 22 '19 at 12:11
I have linked the libc++ bug I found in the question comments. – walnut Oct 25 '19 at 01:11

score 0 · Answer 3 · answered Apr 19 '18 at 12:42

The intended semantics of your std::cin >> n command are described here (as, apparently, std::num_get::get() is called for this operation). There have been some semantics changes in this function, specifically w.r.t. the choice of whether to place 0 or not, in C++11 and then again in C++17.

I'm not entirely sure, but I believe these differences may account for the different behavior you're seeing.

Should reading negative into unsigned fail via std::cin (gcc, clang disagree)?

3 Answers3

Linked