Why does std::basic_istream::ignore() extract more characters than specified?

Question

I have the following code:

#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main(int argc, char* argv[]) {
    stringstream buffer("1234567890 ");
    cout << "pos-before: " << buffer.tellg() << endl;
    buffer.ignore(10, ' ');
    cout << "pos-after: " << buffer.tellg() << endl;
    cout << "eof: " << buffer.eof() << endl;
}

And it produces this output:

pos-before: 0
pos-after: 11
eof: 0

I would expect pos-after to be 10 and not 11. According to the specification, the ignore method should stop when any one of the following condition is set:

count characters were extracted. This test is disabled in the special case when count equals std::numeric_limits<std::streamsize>::max()
end of file conditions occurs in the input sequence, in which case the function calls setstate(eofbit)
the next available character c in the input sequence is delim, as determined by Traits::eq_int_type(Traits::to_int_type(c), delim). The delimiter character is extracted and discarded. This test is disabled if delim is Traits::eof()

In this case I expect rule 1 to trigger before all the other rules and to stop when the stream position is 10.

Execution shows that it is not the case. What did I misunderstood ?

I also tried a variation of the code where I ignore only 9 characters. In this case the output is the expected one:

pos-before: 0
pos-after: 9
eof: 0

So it looks like in the case where ignore() extracted the count of characters, it still checks if the next character is the delimiter and if it is, it extracts it too. I can reproduce with g++ and clang++.

I also tried this variation of the code:

#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main(int argc, char* argv[]) {
    cout << "--- 10x get\n";
    stringstream buffer("1234567890");
    cout << "pos-before: " << buffer.tellg() << '\n';
    for(int i=0; i<10; ++i)
        buffer.get();
    cout << "pos-after: " << buffer.tellg() << '\n';
    cout << "eof: " << buffer.eof() << '\n';
    
    cout << "--- ignore(10)\n";
    stringstream buffer2("1234567890");
    cout << "pos-before: " << buffer2.tellg() << '\n';
    buffer2.ignore(10);
    cout << "pos-after: " << buffer2.tellg() << '\n';
    cout << "eof: " << buffer2.eof() << '\n';
}

And the result is:

--- 10x get
pos-before: 0
pos-after: 10
eof: 0
--- ignore(10)
pos-before: 0
pos-after: -1
eof: 1

We see that using ignore() produces an end-of-file condition on the file. Indicating that ignore() did try to extract a character after having extracted 10 characters. But in this case, the 3rd condition is disabled and ignore() should not have tried to look at what the next character was.

Interesting enough Clang 10.0 print 11 but Clang trunk prints 10 (https://godbolt.org/z/ErKqon) . MSVC also prints 10 (tested locally). — Lukas-T, Oct 05 '20 at 08:08
libstdc++ bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94749 — n. m. could be an AI, Oct 05 '20 at 08:10
Thanks, I though it couldn't be a bug because clang and gcc agreed (on my machine) :) — fjardon, Oct 05 '20 at 08:17
It's a standard library bug, not a compiler bug. Clang normally uses the same standard library as gcc unless specifically told otherwise. — n. m. could be an AI, Oct 05 '20 at 09:09

score 1 · Answer 1 · answered Dec 20 '20 at 15:22

The specification of std::basic_istream::ignore in [istream.unformatted] paragraph 25 is a bit unclear clear: it states "Characters are extracted until any of the following occurs:" without any indication of order. Paragraph 25.1 states that at most n characters are extracted (unless n is std::numeric_limits<std::streamsize>) and paragraph 25.3 states that the characters match. However, even if the conditions can be applied in any order, there is no conflict here: the nth character is not, yet, the expected character and ignore() is supposed to stop.

As was pointed out in a comment, there was/is a bug in libstdc++ which seems to be still present with the library shipping with gcc-10.2.0. Using clang++ with libc++ (if necessary, use -stdlib=libc++ when invoking clang++) doesn't show the same behavior.

As an aside: the unformatted input operations are setting a count of characters read which can be accessed using gcount(). Seeking within a stream is a rather way more expensive operation than accessing this count. Using gcount() also shows the problem (and speaking of expensive operations, I also replaced use of std::endl by using '\n'; see this video or this article for more details):

#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>

int main() {
    std::istringstream buffer("1234567890 ");
    buffer.ignore(10, ' ');
    std::cout << "gcount: " << buffer.gcount() << '\n';
    std::cout << "eof: " << std::boolalpha << buffer.eof() << '\n';
}

Thank you. The opened bug looks similar to my first use case. Yet I believe @chris-dodd interpretation is valid and it contradicts (in my understanding) that this bug is valid. I guess I just have to stay away from `ignore()` until the standard is clear (or iostream are obsoleted). — fjardon, Jan 09 '21 at 13:26

score 1 · Answer 2 · answered Dec 20 '20 at 20:45

cppreference is notorious -- you should generally not rely on it for corner cases in the language, and refer to the spec instead, which says:

Effects: Behaves as an unformatted input function (as described above). After constructing a sentry object, extracts characters and discards them. Characters are extracted until any of the following occurs:

n != numeric_limits::max() (18.3.2) and n characters have been extracted so far

end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit), which may throw ios_base::failure (27.5.5.4));

traits::eq_int_type(traits::to_int_type(c), delim) for the next available input character c (in which case c is extracted).

Using "any of" here instead of "one of" makes it clear that ignore will stop if more than one of the conditions applies. That's essentiall the issue here -- both the first and thrid conditions apply, which brings up an underspecified corner case -- the third condition states that the next available character (that matches the delimiter) will also be extracted.

So this is exactly what the library is doing in this case -- the third condition applies, so it extracts the character. The fact that the first condition also applies is immaterial.

I really don't see how this is anything wrong with cppreference (for that matter, it's not the one I've generally heard described as notorious for errors; that would be `cplusplus.com`). The cppreference language adds an unnecessary "one" to what the spec says, but at worst it's mild imprecision of the language, and it doesn't imply there is an ordering to how the rules are applied (which is the real problem here; even in the language spec, it's being unnecessarily confusing as to what *should* happen when both 1 and 3 are true (it could be more clear that extracting the delimiter occurs). — ShadowRanger, Jan 08 '21 at 20:22
Thank, you. This explains my first test-case, but not the second test-case where the 11th character is eof and according to the spec the 11th character should not have been extracted. — fjardon, Jan 09 '21 at 13:20
Also if I understand correctly what you say, it seems the bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94749 is not a bug and yet it was accepted. — fjardon, Jan 09 '21 at 13:23

Why does std::basic_istream::ignore() extract more characters than specified?

2 Answers2

Linked