11

This code loops forever:

#include <iostream>
#include <fstream>
#include <sstream>

int main(int argc, char *argv[])
{
    std::ifstream f(argv[1]);
    std::ostringstream ostr;

    while(f && !f.eof())
    {
        char b[5000];
        std::size_t read = f.readsome(b, sizeof b);
        std::cerr << "Read: " << read << " bytes" << std::endl;
        ostr.write(b, read);
    }
}

It's because readsome is never setting eofbit.

cplusplus.com says:

Errors are signaled by modifying the internal state flags:

eofbit The get pointer is at the end of the stream buffer's internal input array when the function is called, meaning that there are no positions to be read in the internal buffer (which may or not be the end of the input sequence). This happens when rdbuf()->in_avail() would return -1 before the first character is extracted.

failbit The stream was at the end of the source of characters before the function was called.

badbit An error other than the above happened.

Almost the same, the standard says:

[C++11: 27.7.2.3]: streamsize readsome(char_type* s, streamsize n);

32. Effects: Behaves as an unformatted input function (as described in 27.7.2.3, paragraph 1). After constructing a sentry object, if !good() calls setstate(failbit) which may throw an exception, and return. Otherwise extracts characters and stores them into successive locations of an array whose first element is designated by s. If rdbuf()->in_avail() == -1, calls setstate(eofbit) (which may throw ios_base::failure (27.5.5.4)), and extracts no characters;

  • If rdbuf()->in_avail() == 0, extracts no characters
  • If rdbuf()->in_avail() > 0, extracts min(rdbuf()->in_avail(),n)).

33. Returns: The number of characters extracted.

That the in_avail() == 0 condition is a no-op implies that ifstream::readsome itself is a no-op if the stream buffer is empty, but the in_avail() == -1 condition implies that it will set eofbit when some other operation has led to in_avail() == -1.

This seems like an inconsistency, even despite the "some" nature of readsome.

So what are the semantics of readsome and eof? Have I interpreted them correctly? Are they an example of poor design in the streams library?


(Stolen from the [IMO] invalid libstdc++ bug 52169.)

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055

4 Answers4

5

I think this is a customization point, not really used by the default stream implementations.

in_avail() returns the number of chars it can see in the internal buffer, if any. Otherwise it calls showmanyc() to try to detect if chars are known to be available elsewhere, so a buffer fill request is guaranteed to succeed.

In turn, showmanyc() will return the number of chars it knows about, if any, or -1 if it knows that a read will fail, or 0 if it doesn't have a clue.

The default implementation (basic_streambuf) always returns 0, so that is what you get unless you have a stream with some other streambuf overriding showmanyc.

Your loop is essentially read-as-many-chars-as-you-know-is-safe, and it gets stuck when that is zero (meaning "not sure").

Bo Persson
  • 90,663
  • 31
  • 146
  • 203
2

I don't think that readsome() is meant for what you're trying to do (read from a file on disk)... from cplusplus.com:

The function is intended to be used to read binary data from certain types of asynchronic sources that may wait for more characters, since it stops reading when the local buffer exhausts, avoiding potential unexpected delays.

So it sounds like readsome() is intended for streams from a network socket or something like that, and you probably want to just use read().

bdow
  • 181
  • 4
  • Actually I'm reading a NMEA stream from a GPS receiver. But FWIW the original code wasn't mine; it was from another SO question, and I merely wanted to rationalise about the behaviour of `.readsome`; whether it's appropriate for a file or not, I'd expect `eof` to be set when EOF is reached. – Lightness Races in Orbit Feb 21 '12 at 00:05
  • Ah, ok. If the structure of the code still mirrors the snippet above, I would still suggest using read()... readsome() seems to use non-blocking semantics, which would just cause a tight do-nothing loop when no data has come in. Just to clarify: this code still does nothing when the socket is closed? – bdow Feb 21 '12 at 00:21
  • You're right, that's the behaviour of `readsome()`. This question is about rationalising about that behaviour, regardless of any particular use. And I'm not sure what happens when the socket is closed as I no longer have the original testcase running anywhere. – Lightness Races in Orbit Feb 21 '12 at 10:02
  • I think that's the root of the question: when EOF occurs on the stream. If there's no data available, but there may be in the future (a pipe, socket, or similar structure that is open but currently has no data available), then that's different from the case where the system knows there is no more data (when the pipe/socket/etc has been closed). At least that's my take on it. – bdow Feb 21 '12 at 13:53
  • If it's a file stream, and the end of the file has been reached, then `eofbit` ought to be set. IMO. If using a file stream breaks the semantics of `readsome`, then `readsome` should not be available for file streams. – Lightness Races in Orbit Feb 21 '12 at 15:05
  • Yes, if it's reading a file from disk, I agree: `readsome` should set `eofbit` when it reaches the end of the file. I guess the standard doesn't require this, in which case we probably shouldn't count on `readsome` to detect error states by itself anyway. How to do this? I guess a `read` of 1 byte followed by `readsome` to avoid the presumed overhead of 1 system call per byte might work, though honestly for an NMEA stream you might not care, and could just write a state machine byte-by-byte. – bdow Feb 21 '12 at 16:04
  • Again, I'm not looking for any "solution", but for rationalisations about `readsome`'s semantics in the general case. :) – Lightness Races in Orbit Feb 21 '12 at 16:33
1

Others have answered why readsome won't set eofbit by design. I will suggest a way to read some bytes until eof without setting fail bit in a intuitive way, in the same way you were trying to use readsome. This is the result of research in another question.

#include <iostream>
#include <fstream>
#include <sstream>

using namespace std;

streamsize Read(istream &stream, char *buffer, streamsize count)
{
    // This consistently fails on gcc (linux) 4.8.1 with failbit set on read
    // failure. This apparently never fails on VS2010 and VS2013 (Windows 7)
    streamsize reads = stream.rdbuf()->sgetn(buffer, count);

    // This rarely sets failbit on VS2010 and VS2013 (Windows 7) on read
    // failure of the previous sgetn()
    stream.rdstate();

    // On gcc (linux) 4.8.1 and VS2010/VS2013 (Windows 7) this consistently
    // sets eofbit when stream is EOF for the conseguences  of sgetn(). It
    // should also throw if exceptions are set, or return on the contrary,
    // and previous rdstate() restored a failbit on Windows. On Windows most
    // of the times it sets eofbit even on real read failure
    stream.peek();

    return reads;
}

int main(int argc, char *argv[])
{
    ifstream instream("filepath", ios_base::in | ios_base::binary);
    while (!instream.eof())
    {
        char buffer[0x4000];
        size_t read = Read(instream, buffer, sizeof(buffer));
        // Do something with buffer 
    }
}
ceztko
  • 14,736
  • 5
  • 58
  • 73
1

If no character is available (i.e. gptr() == egptr() for the std:streambuf) the virtual member function showhowmanyc() is called. I could have an implementation of showmanyc() which returns an error code. Why that may be useful is a different question. However, this could set eof(). Of course, in_avail() is meant not to fail and not to block and just return the characters known to be available. That is, the loop you have above is essentially guaranteed to be an infinite loop unless you have a rather odd stream buffer.

Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380