0

my question is about how EOF is interpreted in the middle of an input, here is an example:

int main() {

  int a, b;
    printf("enter something >\n");
    scanf("%d", &a);
    while((b = getchar()) != EOF) {
      printf("%i\n", b);
    }
  return b;
 }

I run the program and enter:

 1hello^Z(control+z)abc

the output is:

 104 (ascii number for h)
 101 (for e)
 108 (l)
 108 (l) 
 111 (o)
 26 (what is this?)

The digit 1 is read by scanf, the remaining stays in the buffer, getchar() gets all of them until ^Z, which is expected behavior, as the control z closes stdin. however where does 26 come from? If the last thing getchar() reads is EOF why isn't -1 the last value? Also why doesn't this program get out of the loop when it reads ^Z, why do I need to invoke EOF one more time with control z to terminate the loop? 26 is the ascii for SUB, I don't know what to make of this.

Thank you.

Werner Henze
  • 16,404
  • 12
  • 44
  • 69
Mcs
  • 534
  • 1
  • 5
  • 14
  • 3
    Ctrl-Z *is* ASCII 26. – Paul Roub Jan 29 '15 at 19:49
  • I thought control Z was automatically converted to EOF which has a negative value, usually -1 – Mcs Jan 29 '15 at 19:53
  • EOF will be *reported* as -1, but it's not really a conversion. DOS/Windows terminals have a nasty habit of actually including Ctrl-Z in the input first. Someone will probably come along and correct me on the details, but that's basically what's happening. – Paul Roub Jan 29 '15 at 19:57
  • yeah that is what I figure thanks, it reports EOF as -1 only if I enter it in the beginning of the line, in the middle of the line, it reports it as 26, it stops stdin, but doesn't really recognize EOF until I enter it one more time in the beginning of the line. Weird I will get a mac – Mcs Jan 29 '15 at 20:05
  • 1
    What OS are you using? – Jens Jan 29 '15 at 20:44
  • windows 8.1 I am installing linux now:)) – Mcs Jan 29 '15 at 20:47
  • I'm not sure that I understand the question, but maybe this is relevant: [Preventing Windows program from interpreting ^Z as end of file](http://stackoverflow.com/questions/27545159/preventing-windows-program-from-interpreting-z-as-end-of-file). – yellowantphil Jan 29 '15 at 21:01
  • 1
    I'm a little surprised you're seeing the ^Z in the input. On Unix (Mac, Linux), the corresponding character is normally ^D (control-D instead of control-Z). If you type some data on a line and then ^D, the data is sent. That's a non-zero number of bytes, so the read succeeds (albeit with no newline), and continues. If you type more data, that will be read and returned to the app, either when ^D is typed again or when newline (enter) is pressed. If you type ^D twice in a row, the second one means that zero bytes are available, and that's EOF. Or it's EOF you type ^D at the start of a line. – Jonathan Leffler Jan 29 '15 at 21:29
  • which compiler and library? (e.g. cygwin or mingw console programs will go with the ^D convention) – M.M Jan 29 '15 at 21:36

1 Answers1

0

When the loop ends, b=26 because you entered ctrl+z and this is interpreted as SUB while returning .http://en.wikipedia.org/wiki/Substitute_character

In the ASCII and Unicode character sets, this character(SUB) is encoded by the number 26 (1A hex). Standard keyboards transmit this code when the Ctrl and Z keys are pressed simultaneously (Ctrl+Z, by convention often described as ^Z).