3

This is a follow-up question of my previous question. There is already a similar question asked(question). But I don't get what I want to know from that answer.

From the previous question I come to know that if I type a lot of characters, then they are not made available to getchar(), until I press Enter. So at the very point when I press Enter, all the characters will be made available to getchar()s. Now consider the following program for character counting:

#include<stdio.h>
main()
{
  long nc;
  nc=0;
  while(getchar()!=EOF)
  ++nc;
  printf("    Number of chars are %ld ",nc);
}

If I input characters from the command line in the following sequence: {1,2,3,^Z,4,5,Enter}, then in the next line {^Z,Enter}. The output that I expect is: Number of chars are 6. But the output that I am getting is Number of chars are 4.

This answer explains that when we input1,2,3,^Z, then ^Z acts like Enter and 1,2,3 are sent to getchar()s. The while loop of the above written code runs three times. ^Z is not given to getchar(), so the program doesn't terminate yet. My input was {1,2,3,^Z,4,5,Enter}. After ^Z I had pressed 4,5 and then Enter. Now when I press Enter the characters 4,5 and Enter, should be given to getchar()s and the while loop should execute three times more. Then in the last line I input {^Z,Enter}, since there is no text behind ^Z, it is consider as a character and when I press Enter, this ^Z is given as the input to getchar() and the while loop terminates. In all this, the while loop has executed 6 times, so the variable nc should become 6.

  • Why am I getting 4 as the value of nc, rather than 6.
Community
  • 1
  • 1
user31782
  • 7,087
  • 14
  • 68
  • 143
  • Which platform do you use? If you are working with Linux you probably wanted to press ^D instead of ^Z. – Marian Dec 07 '14 at 13:23
  • @Marian I'm using Windows-7. – user31782 Dec 07 '14 at 13:25
  • @Marian You are the author of http://stackoverflow.com/a/27184020/3429430 !! You could shed some light on my problem. – user31782 Dec 07 '14 at 13:27
  • 4
    Windows differs from Linux and the Windows ^Z is not equivalent to Linux ^D. Probably Windows OS forwards the ^Z to the program and then cut the rest of the input. – Marian Dec 07 '14 at 13:38
  • If I input {1,2,3,^Z,4,5,Enter} then same {1,2,3,^Z,4,5,Enter} then {^Z,Enter}. The output is nc=8. Means the input after Enter is not cut. – user31782 Dec 07 '14 at 13:52
  • As far as I remember ^Z on DOS/Windows is the "End of File" marker and gets interpreted as "no further input beyond this point". So the ^Z itself gets counted but nothing beyond it. When opening a file yourself you can disable this by adding the "b" modifier to `fopen()` (for `b`inary input), but the standard input stream doesn't have this modifier applied to it. – Hartmut Holzgraefe Dec 07 '14 at 14:24
  • ^Z is only counted when there is no text behind it. If ^Z gets counted then the program should terminate eventually, when I input 1,2,3,^Z,4,5,Enter}. But the program takes the next line of the text as further input. – user31782 Dec 07 '14 at 14:31
  • You must specify the type for `main` (or any other function). – n. m. could be an AI Dec 07 '14 at 17:49
  • ^Z not at the beginning of a line does not terminate the input, it only terminates the current line. – n. m. could be an AI Dec 07 '14 at 17:51
  • @n.m. Why should I specify the type for `main()`? – user31782 Dec 08 '14 at 12:35
  • Because the C language standard says so. – n. m. could be an AI Dec 08 '14 at 12:56
  • You count `char`s/bytes, not characters. A single character may need have multiple bytes and even multiple Unicode codepoints and you could make a point that a single unicode codepoint may have more than 1 character (depending on your definition of character). – 12431234123412341234123 Jun 20 '23 at 12:00

2 Answers2

3

Adding some output will help you:

#include <stdio.h>
int
main (void)
{
  int c, nc = 0;
  while ((c = getchar ()) != EOF)
    {
      ++nc;
      printf ("Character read: %02x\n", (unsigned) c);
    }
  printf ("Number of chars: %d\n", nc);
}

The Windows console views the ^Z input as "send input before ^Z to stdin, discard remaining input on the line (including the end-of-line delimiter), and send ^Z" unless it is at the beginning of a line, in which case it sends EOF instead of ^Z:

123^Z45
Character read: 31
Character read: 32
Character read: 33
Character read: 1a
^Z12345
Number of chars: 4

Also, Windows always waits for the Enter/Return key, with the exception of very few key sequences like ^C or ^{Break}.

  • 123^Z45 gives me Character read: 31 Character read: 32 Character read: 33 Character read: **1a** (instead of 1b). Isn't the value of `^Z` `-1`? – user31782 Dec 08 '14 at 04:45
  • @user31782 It is `1a`, and I've edited the answer to reflect that. `^A` has a hex code of `01`, `^B` has a hex code of `02`, etc., meaning `^Z` is `1a`. `EOF` is `-1` on Windows, and when you type `^Z` at the beginning of a line, it sends `EOF` instead of hex code `1a`, which might be why you think `^Z` is `-1`. –  Dec 08 '14 at 05:00
  • What is ASCII value of `^Z`? Is the compiler taking it differently? – user31782 Dec 08 '14 at 05:02
  • The ascii value of `^Z`is 26(1a in hex). So when ^Z is at the beginning of a line, it takes its value `-1`, otherwise the ACSII value `1a`. – user31782 Dec 08 '14 at 05:06
  • 1
    @user31782 Yes that is correct. `^Z` at the beginning of a line makes `getchar` return `EOF` (-1). Otherwise `getchar` returns the ASCII value of `^Z`. –  Dec 08 '14 at 05:07
1

^Z, or Ctrl-Z, means end-of file for text files (old MS-DOS). getchar() is equivalent to fgetc(stdin) and is often a macro. "fgetc returns the character read as an int or returns EOF to indicate an error or end of file."

See also _set_fmode, however, I am not sure if that changes the behaviour right away or whether you have to close/reopen the file. Not sure either if you can close/reopen stdin (don't do much console programming anymore).

Paul Ogilvie
  • 25,048
  • 4
  • 23
  • 41
  • Could you reword this statement[_So the input library sees end-of-file and then forwards the buffer read so for to the program._] differently. I'm not a native speaker. – user31782 Dec 07 '14 at 14:40