Inspired by my previous question
A common mistake for new C++ programmers is to read from a file with something along the lines of:
std::ifstream file("foo.txt");
std::string line;
while (!file.eof()) {
file >> line;
// Do something with line
}
They will often report that the last line of the file was read twice. The common explanation for this problem (one that I have given before) goes something like:
The extraction will only set the EOF bit on the stream if you attempt to extract the end-of-file, not if your extraction just stops at the end-of-file.
file.eof()
will only tell you if the previous read hit the end-of-file and not if the next one will. After the last line has been extracted, the EOF bit is still not set and the iteration occurs one more time. However, on this last iteration, the extraction fails andline
still has the same content as before, i.e. the last line is duplicated.
However, the first sentence of this explanation is wrong and so the explanation of what the code is doing is also wrong.
The definition of formatted input functions (which operator>>(std::string&)
is) defines extraction as using rdbuf()->sbumpc()
or rdbuf()->sgetc()
to obtain input characters. It states that if either of these functions returns traits::eof()
, then the EOF bit is set:
If
rdbuf()->sbumpc()
orrdbuf()->sgetc()
returnstraits::eof()
, then the input function, except as explicitly noted otherwise, completes its actions and doessetstate(eofbit)
, which may throwios_base::failure
(27.5.5.4), before returning.
We can see this with the simple example that uses a std::stringstream
rather than a file (they are both input streams and behave the same way when extracting):
int main(int argc, const char* argv[])
{
std::stringstream ss("hello");
std::string result;
ss >> result;
std::cout << ss.eof() << std::endl; // Outputs 1
return 0;
}
It's clear here that the single extraction obtains hello
from the string and sets the EOF bit to 1.
So what's wrong with the explanation? What's different about files that causes !file.eof()
to cause the last line to be duplicated? What's the real reason we shouldn't use !file.eof()
as our extraction condition?