Let's say we have a stream containing simply:
hello
Note that there's no extra \n
at the end like there often is in a text file. Now, the following simple code shows that the eof
bit is set on the stream after extracting a single std::string
.
int main(int argc, const char* argv[])
{
std::stringstream ss("hello");
std::string result;
ss >> result;
std::cout << ss.eof() << std::endl; // Outputs 1
return 0;
}
However, I can't see why this would happen according to the standard (I'm reading C++11 - ISO/IEC 14882:2011(E)). operator>>(basic_stream<...>&, basic_string<...>&)
is defined as behaving like a formatted input function. This means it constructs a sentry
object which proceeds to eat away whitespace characters. In this example, there are none, so the sentry
construction completes with no problems. When converted to a bool
, the sentry
object gives true
, so the extractor continues to get on with the actual extraction of the string.
The extraction is then defined as:
Characters are extracted and appended until any of the following occurs:
n
characters are stored;- end-of-file occurs on the input sequence;
isspace(c,is.getloc())
is true for the next available input character c.After the last character (if any) is extracted, is.width(0) is called and the sentry object k is destroyed. If the function extracts no characters, it calls
is.setstate(ios::failbit)
, which may throwios_base::failure
(27.5.5.4).
Nothing here actually causes the eof
bit to be set. Yes, extraction stops if it hits the end-of-file, but it doesn't set the bit. In fact, the eof
bit should only be set if we do another ss >> result;
, because when the sentry
attempts to gobble up whitespace, the following situation will occur:
If
is.rdbuf()->sbumpc()
oris.rdbuf()->sgetc()
returnstraits::eof()
, the function callssetstate(failbit | eofbit)
However, this is definitely not happening yet because the failbit
isn't being set.
The consequence of the eof
bit being set is that the only reason the evil-idiom while (!stream.eof())
doesn't work when reading files is because of the extra \n
at the end and not because the eof
bit isn't yet set. My compiler is happily setting the eof
bit when the extraction stops at the end of file.
So should this be happening? Or did the standard mean to say that setstate(eofbit)
should occur?
To make it easier, the relevant sections of the standard are:
- 21.4.8.9 Inserters and extractors [string.io]
- 27.7.2.2 Formatted input functions [istream.formatted]
- 27.7.2.1.3 Class
basic_istream::sentry
[istream::sentry]