13

I'm learning C++ and trying to understand,
Why doesn't the EOF character (Ctrl + Z on Windows) break the while loop if put at the end of a line?

My code:

    int main() {
        char ch;
        while(cin >> ch) {
            cout << ch;
        }
    }

When I enter ^Z, the loop breaks;
But when I enter 12^Z, it doesn't.

Zach
  • 539
  • 1
  • 4
  • 22
Cutter
  • 1,673
  • 7
  • 27
  • 43
  • could have to do with 12^z != ^z ... 12^z will not evaluate to false – Mare Infinitus Jul 07 '12 at 21:53
  • 1
    Unix systems work the same way; CTRL-D in the middle of a line is ignored (or maybe not completely; a bash shell will beep, but still ignore it), it only works at the beginning of a line. I have no idea whether there's a real reason for that, or whether some guy back in the 60s when Unix was invented thought this would be a nice thing to have, and it's been like ever since with nobody knowing why :-) – Christian Stieber Jul 07 '12 at 21:57
  • 1
    @ChristianStieber: On Unix-like systems, a single control-D triggers an end-of-file condition at the beginning of a line; otherwise, *two* control-Ds trigger an end-of-file condition. – Keith Thompson Jul 07 '12 at 22:14

3 Answers3

9

You won't find an answer to your question in the C++ standard.

cin >> ch will be a "true" condition as long as there's neither an end-of-file condition nor an input error. How an end-of-file condition is triggered is not specified by the language, and it can and will vary from one operating system to another, and even with configuration options in the same OS. (For example, Unix-like systems use control-D by default, but that can be altered by the stty command.)

Windows uses Control-Z to trigger an end-of-file condition for a text input stream; it just happens not to do so other than at the beginning of a line.

Unix behaves a bit differently; it uses Control-D (by default) at the beginning of a line, or two Control-Ds in the middle of a line.

For Unix, this applies only when reading from a terminal; if you're reading from a file, control-D is just another non-printing character, and it doesn't trigger an end-of-file condition. Windows appears to recognize control-Z as an end-of-file trigger even when reading from a disk file.

Bottom line: Different operating systems behave differently, largely for obscure historical reasons. C++ is designed to work with any of these behaviors, which is why it's not specific about some of the details.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
4

The C and C++ standards allow text streams to do quite Unholy things in text mode, which is the default. These Unholy Things include translation between internal newline markers and external newline control characters, as well as treating certain characters or character sequences as denoting end of file. In Unix-land it's not done, but in Windows-land it's done, so the the code can relate only to the original Unix-land conventions.

This means that in Windows, there is no way to write a portable C or C++ program that will copy its input exactly to its input.

While in Unix-land, that's no problem at all.

In Windows, a line consisting of a single [Ctrl Z] is by convention an End Of File marker. This is so not only in the console, but also in text files (depending a bit on the tools). Windows inherited this from DOS, which in turn inherited the general idea from CP/M.

I'm not sure where CP/M got it from, but it's only similar, not at all the same!, as Unix' [Ctrl D].

Over in Unix-land the general convention for end of file is just "no more data". In the console a [Ctrl D] will by default send your typed text immediately to the waiting program. When you haven't typed anything on the line yet, 0 bytes are sent, and a read that returns 0 bytes has by convention encountered end-of-file.

The main difference is that internally in Windows the text end of file marker is data, that can occur within a file, while internally in Unix it's lack of data, which can't occur within a file. Of course Windows also supports ordinary end of file (no more data!) for text. Which complicates things – Windows is just more complicated.


#include <iostream>
using namespace std;

int main()
{
    char ch;
    while(cin >> ch) {
        cout << 0+ch << " '" << ch << "'" << endl;
    }
}
Cheers and hth. - Alf
  • 142,714
  • 15
  • 209
  • 331
  • What I still don't understand, is that at some point, when the 1 and 2 from my code are read and put in ch, they're supposed to disappear from cin, and thus the only character remaining is [Ctrl + Z], just as if it was a line consisting of a single EOF. Then cin.eof() should return true. – Cutter Jul 07 '12 at 22:48
  • 2
    [Ctrl Z] in the input, alone on a line, is one thing. In Windows it will be translated. A [Ctrl Z] that has survived the text translation is a very different thing. Try out the code I now added to the answer. – Cheers and hth. - Alf Jul 07 '12 at 22:57
1

This is caused by cin >> ^Z will evaluate to false.

More detailed: cin.eof() will return true on that, so that the while, which implicitly calls eof() will return false and therefore end the loop.

If you input 12^Z, eof() will return false, as it can parse a valid inputvalue, hence it will not stop the loop.

You might be interested in this SO also:

SO on semantics of flags

Community
  • 1
  • 1
Mare Infinitus
  • 8,024
  • 8
  • 64
  • 113
  • 1
    Thanks for that clarification. However, as far as I understand, when a character is put into `ch`, it is removed from cin. Thus, after 1 and 2 are put into ch, only ^Z remains in cin and cin.eof() should return true. The loop should break then, is that right? – Cutter Jul 07 '12 at 22:07
  • The inputted strings are not sequentially computed afaik. they are computed as a single input. and only if this input is the eof, then eof is set to true. – Mare Infinitus Jul 07 '12 at 22:10
  • But why isn't eof() set to true when every other character (before EOF) has been read? – Cutter Jul 07 '12 at 22:14
  • as i said, the input is not read sequentially but blockwise afaik – Mare Infinitus Jul 07 '12 at 22:15
  • I have my doubts if this is explains anything. Even if it can parse a valid input value, it should still trigger the `eof` _after_ that, shouldn't it? And blocks are not in any way required to be connected with newlines at all, though in practise they usually are. – leftaroundabout Jul 07 '12 at 22:16
  • no, it should not. there is some input in that block, it is especially not eof. but this seems to be some question of taste. – Mare Infinitus Jul 07 '12 at 22:17