4

I usually teach my students that the safe way to tackle file input is:

while (true) {
    // Try to read
    if (/* failure check */) {
        break;
    }
    // Use what you read
}

This saved me and many people from the classical and most of the time wrong:

while (!is.eof()) {
    // Try to read
    // Use what you read
}

But people really like this form of looping, so it has become common to see this in student code:

while (is.peek()!=EOF) { // <-- I know this is not C++ style, but this is how it is usually written
    // Try to read
    // Use what you read
}

Now the question is: is there a problem with this code? Are there corner cases in which things don't work exactly as expected? Ok, it's two questions.

EDIT FOR ADDITIONAL DETAILS: during exams you sometimes guarantee the students that the file will be correctly formatted, so they don't need to do all the checks and just need to verify if there's more data. And most of the time we deal with binary formats, which allow you to not worry about whitespace at all (because the data is all meaningful).

While the accepted answer is totally clear and correct, I'd still like someone to try to comment on the joint behavior of peek() and unget().

The unget() stuff came to my mind because I once observed (I believe it was on Windows) that by peeking at the 4096 internal buffer limit (so effectively causing a new buffer to be loaded), ungetting the previous byte (last of the previous buffer) failed. But I can be wrong. So that was my additional doubt: something known I missed, which maybe is well coded in the standard or in some library implementations.

Costantino Grana
  • 3,132
  • 1
  • 15
  • 35
  • You will want to take a look at [Why !.eof() inside a loop condition is always wrong.](https://stackoverflow.com/q/5605125/9254539) 4th condition is just a little better. Generally always better to control the read with the read function itself, e.g. `while (std::cin >> var) { /* handle var */ }` or `while (getline (fp, str)) { /* handle string */ }` and so on... – David C. Rankin Feb 21 '20 at 09:57

1 Answers1

8

is.peek()!=EOF tells you whether there are still characters left in the input stream, but it doesn't tell you whether your next read will succeed:

while (is.peek()!=EOF) {
    int a;
    is >> a;
    // Still need to test `is` to verify that the read succeeded
}

is >> a could fail for a number of reasons, e.g. the input might not actually be a number.

So there is no point to this if you could instead do

int a;
while (is >> a) { // reads until failure of any kind
    // use `a`
}

or, maybe better:

for (int a; is >> a;) { // reads until failure of any kind
    // use `a`
}

or your first example, in which case the is.peek()!=EOF in the loop will become redundant.

This is assuming you want the loop to exit on every failure, following your first code example, not only on end-of-file.

walnut
  • 21,629
  • 4
  • 23
  • 59
  • Or `for(int a; is >> a;)` – Caleth Feb 21 '20 at 09:30
  • Ok, this is all clear. So, if I assume that the only reason why the read could fail is reaching the end of file (a very strong and generous assumption), the idea is correct, right? This should work even if later someone plays with `std::istream::unget()`? – Costantino Grana Feb 21 '20 at 10:26
  • @CostantinoGrana Are you sure that is a good assumption? For example if there are whitespaces at the end of the input, then `peek()` will not return EOF, although all formatted reads will fail. I am not sure what you have in mind with `unget()`, but I suppose `unget()` could change `peek()` returning EOF to not returning EOF. – walnut Feb 21 '20 at 10:34
  • @walnut No, I'm sure it is not a good assumption. But during exams you sometimes guarantee the students that the file will be correctly formatted, so they don't need to do all the checks. And most of the time we deal with binary formats, which allow you to not worry about whitespace at all (because the data is all meaningful). The `unget()` stuff came to my mind because I once observed (I believe it was on Windows) that by peeking at the 4096 buffer limit, ungetting the previous byte (last of the previous buffer) failed. But I can be wrong. So that was my doubt: something known I missed. – Costantino Grana Feb 21 '20 at 10:41
  • @CostantinoGrana You might want to clarify these concerns in your question, so that someone else might add an answer. I honestly don't think I know the internals of the streams library well enough to answer that. – walnut Feb 21 '20 at 10:45