2

I am runing the following simple program using Visual Studio 2010. The purpose is to see what will happen if I define variable c as char or int, since the getchar() function returns an integer (A widely known pitfall in the C programming language, refer to int c = getchar()?).

#include <stdio.h> 

int main() 

{ 
   char c; 
   //int c;

   while((c = getchar()) != EOF) 

       putchar(c);

   printf("%d\n",c);
   return 0;
 }

When I input some characters from the console to this program, I found a strange phenomenon, as shown in the following figure. If the EOF as input follows a sequence of characters (the 1st line), it can not be correctly recognized (a small right arrow is ouput, 2nd line). However, if it is input standalone (4th line), it can be correctly recognized and the program terminates.

I didn't test this program on Linux, but can someone explain why this happen?

enter image description here

Community
  • 1
  • 1
Bloodmoon
  • 1,308
  • 1
  • 19
  • 34
  • Is this behavior different from when `c` is declared an `int`? – Fred Foo Jul 11 '13 at 15:03
  • 1
    @larsmans No, it's not. I got the same result if `c` is of type `int`. – Bloodmoon Jul 11 '13 at 15:10
  • Try typing ÿ (y-umlaut, LATIN SMALL LETTER Y WITH DIAERESIS, U+00FF); I expect your program stops on reading that character too when you use `char c;` and it shouldn't really do that. You'd be OK if you used `int c;`. Remember: `getchar()` returns an `int`! – Jonathan Leffler Jul 11 '13 at 15:41
  • @JonathanLeffler Yes, `getchar()` returns an `int`, and I know that define `c` as char is a bug here. But this is not the point I want to discuss, I want to know why the `EOF` character is not correctly recognized here from the console. – Bloodmoon Jul 11 '13 at 15:48

2 Answers2

2

What you're describing is, basically, the way terminals are designed.

You need to remember that EOF is not a character. When you type "ABCDEFCTRL-Z", you're entering eight input characters: A, B, C, D, E, F, CTRL-Z, and Return. The only thing special about CTRL-Z (or CTRL-D on Unix/Linux) is that if you type that as the first thing on a new line, then instead of entering a character, the terminal behaves as though the end of the input file has been reached. The getchar() function will return EOF. Since any possible value that can fit into an unsigned char is a valid return value for getchar(), EOF can be distinguished from any valid return value by virtue of being negative, which is why getchar() and family are defined to return int.

This isn't my real name
  • 4,869
  • 3
  • 17
  • 30
  • thanks for your answer, but it still does not explain why `EOF` is not correctly recognized when it follows some characters, i.e. not in a standalone line. – Bloodmoon Jul 11 '13 at 16:17
  • Because that's the way it was designed. When you press the key (Control-Z, or Control-D, or whatever else) that is used to signal EOF, it is only treated as being a signal of EOF at the start of a line. Fundamentally, it works this way because it was designed to work this way, and that's pretty much it. And yes, you'll see the same sort of behavior on Unix as well. – This isn't my real name Jul 11 '13 at 16:21
  • I see, thanks! Very strange that I didn't read any books telling me `EOF` should be in a standalone line. – Bloodmoon Jul 11 '13 at 17:50
1

If you change your program a little bit and put two printf statements, you will see that the program actually can read the CRTL+Z combination correctly (ASCII code 26):

#include <stdio.h> 

int main() 

{ 
   char c; 
   //int c;

   while((c = getchar()) != EOF) {
       printf("%d\n",c);
       putchar(c);
       printf("\n");
   }

   printf("%d\n",c);
   return 0;
 }

But as the above answer tells, it must be on it's own line; in order to be interpreted correctly. Because on windows, each line has an EOL characters except the last line. There is an EOF character after the last line.

gst
  • 828
  • 5
  • 15
  • Hmm, I am getting confused. In fact under dubug mode, I have seen the value of `c` is 26, but I regarded this as the result of not correctly recognizing `EOF`. I expect `c` to be `-1` when the `EOF` is input, since by default `c` is of `signed char` in VS 2010 (also in gcc I believe), and this is the reason why at most of the time defining `c` as `char` will not lead to errors if the input characters are all English characters. – Bloodmoon Jul 11 '13 at 16:07
  • And, could you please explain a little more about the `EOL`? – Bloodmoon Jul 11 '13 at 16:08
  • No the actual value of CTRL+Z according to the ASCII table is 26, and having this value in c is a correct behavior. Take a look at this page: http://rabbit.eng.miami.edu/class/een218/getchar.html – gst Jul 11 '13 at 17:02
  • The End of Line (EOL) character (0x0D0A, \r\n) is actually two ASCII characters and is a combination of the CR and LF characters. It moves the cursor both down to the next line and to the beginning of that line. This character is used as a new line character in most other non-Unix operating systems including Microsoft Windows, Symbian OS and others. – gst Jul 11 '13 at 17:07
  • ahh, I know the `CRLF`, but i don't know it is called `EOL`. And here comes out one more question: since there are two characters, why `getchar()` only reads in the LineFeed(\n) but ignores the Carriage Return(\n)? Does the C complier on Windows takes the `CRLF` as a whole and regard it as the `\n`? I know that the EOL on Linux and MAc is `\n`. – Bloodmoon Jul 11 '13 at 17:49
  • I think it is a matter of standards that specify how to map a special set of characters. And i think because of the standards the implementation has to map \r\n to \n when returns it and to map \n again to \r\n when printing it. It is an implementation issue i think. It is a good idea to take a look at implementations and standards to find the exact answer to this question. – gst Jul 11 '13 at 18:20