10

Let's say we have a stream containing simply:

hello

Note that there's no extra \n at the end like there often is in a text file. Now, the following simple code shows that the eof bit is set on the stream after extracting a single std::string.

int main(int argc, const char* argv[])
{
  std::stringstream ss("hello");
  std::string result;
  ss >> result;
  std::cout << ss.eof() << std::endl; // Outputs 1
  return 0;
}

However, I can't see why this would happen according to the standard (I'm reading C++11 - ISO/IEC 14882:2011(E)). operator>>(basic_stream<...>&, basic_string<...>&) is defined as behaving like a formatted input function. This means it constructs a sentry object which proceeds to eat away whitespace characters. In this example, there are none, so the sentry construction completes with no problems. When converted to a bool, the sentry object gives true, so the extractor continues to get on with the actual extraction of the string.

The extraction is then defined as:

Characters are extracted and appended until any of the following occurs:

  • n characters are stored;
  • end-of-file occurs on the input sequence;
  • isspace(c,is.getloc()) is true for the next available input character c.

After the last character (if any) is extracted, is.width(0) is called and the sentry object k is destroyed. If the function extracts no characters, it calls is.setstate(ios::failbit), which may throw ios_base::failure (27.5.5.4).

Nothing here actually causes the eof bit to be set. Yes, extraction stops if it hits the end-of-file, but it doesn't set the bit. In fact, the eof bit should only be set if we do another ss >> result;, because when the sentry attempts to gobble up whitespace, the following situation will occur:

If is.rdbuf()->sbumpc() or is.rdbuf()->sgetc() returns traits::eof(), the function calls setstate(failbit | eofbit)

However, this is definitely not happening yet because the failbit isn't being set.

The consequence of the eof bit being set is that the only reason the evil-idiom while (!stream.eof()) doesn't work when reading files is because of the extra \n at the end and not because the eof bit isn't yet set. My compiler is happily setting the eof bit when the extraction stops at the end of file.

So should this be happening? Or did the standard mean to say that setstate(eofbit) should occur?


To make it easier, the relevant sections of the standard are:

  • 21.4.8.9 Inserters and extractors [string.io]
  • 27.7.2.2 Formatted input functions [istream.formatted]
  • 27.7.2.1.3 Class basic_istream::sentry [istream::sentry]
Joseph Mansfield
  • 108,238
  • 20
  • 242
  • 324
  • 2
    The evil idiom is evil because it also fails to account for a situation where the input stream doesn't even exist, like when you open a program without an open `stdin`. – Kerrek SB Jan 29 '13 at 20:06
  • +1 Well formed question, includes references to the standard and minimal examples. – Thomas Matthews Jan 29 '13 at 20:27

2 Answers2

9

std::stringstream is a basic_istream and the operator>> of std::string "extracts" characters from it (as you found out).

27.7.2.1 Class template basic_istream

2 If rdbuf()->sbumpc() or rdbuf()->sgetc() returns traits::eof(), then the input function, except as explicitly noted otherwise, completes its actions and does setstate(eofbit), which may throw ios_- base::failure (27.5.5.4), before returning.

Also, "extracting" means calling these two functions.

3 Two groups of member function signatures share common properties: the formatted input functions (or extractors) and the unformatted input functions. Both groups of input functions are described as if they obtain (or extract) input characters by calling rdbuf()->sbumpc() or rdbuf()->sgetc(). They may use other public members of istream.

So eof must be set.

ipc
  • 8,045
  • 29
  • 33
  • Ah, you've got it! I didn't see that paragraph about the definition of extraction. Indeed, if the extraction hits the end of the file, the `eof` bit will be set. Thanks! – Joseph Mansfield Jan 29 '13 at 20:17
3

Intuitively speaking, the EOF bit is set because during the read operation to extract the string, the stream did indeed hit the end of the file. Specifically, it continuously read characters out of the input stream, stopping because it hit the end of the stream before encountering a whitespace character. Accordingly, the stream set the EOF bit to mark that the end of stream was reached. Note that this is not the same as reporting failure - the operation was completed successfully - but the point of the EOF bit is not to report failure. It's to mark that the end of the stream was encountered.

I don't have a specific part of the spec to back this up, though I'll try to look for one when I get the chance.

L. F.
  • 19,445
  • 8
  • 48
  • 82
templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065