4

I was working with K&R, and it extensively uses getchar() for input in basics. But the problem is I am unable to fully understand its behavior.

Below is a piece of code:

#include <stdio.h>

int main() {
    char c,i;
    char line[10000];
    i = 0;

    while((c=getchar()) != EOF && c!= '\n') {
        line[i++] = c;
    }

    printf("%s",line);
}

The code works as expected.

My problem with this is: why it terminates when I press enter? How does it know that newline is the termination condition while I am still writing input and the program is at c=getchar()?

I know it is not the default getchar() behavior like scanf() because when I remove the newline condition, the program doesn't terminate at newline. Maybe my question exceeds the getchar() and is a general question.

Suppose my input is Hello and I press enter.

First, the c variable becomes 'H', it gets stored in line, then 'e', then 'l', then 'l', then 'o', after that it encounters the newline and loop terminates. It's well understood.

I want to know why it started reading the characters after I press enter. I was hoping for a newline and write some more characters.

Roberto Caboni
  • 7,252
  • 10
  • 25
  • 39
mr.loop
  • 818
  • 6
  • 20
  • 1
    When you enter a newline `c != '\n'` is false, and thus the while condition as a whole is likewise. Thus ends the loop. This code exhibits *undefined behavior* regardless. `line` is not terminated and as an uninitialized automatic variable, there is no guarantee for a terminator already in place. Therefore passing it as the argument to a `%s` format specifier, which mandates a terminated string, invokes UB, and is at-best a gamble. – WhozCraig Jan 24 '21 at 10:12
  • but the newline condition `c != '\n'` has c in it. While I am writing `Hello`, now matter what c is H. So, unless you are saying that the loop is running as I am writing, the termination doesn't makes sense. – mr.loop Jan 24 '21 at 10:15
  • So your asking why/whether `stdin` is *line buffered* ? (because it usually is, fyi). – WhozCraig Jan 24 '21 at 10:17
  • @WhozCraig I am just wondering that why `c != '\n'` invoked while c is still empty or H maybe. Or is it some other reason the input terminated when I press enter – mr.loop Jan 24 '21 at 10:19
  • 1
    *"why c != '\n' invoked while c is still empty"* - Its not. Once the newline is in the stream the buffer is sent, in your case consumed one char at a time. Btw, in my experience it is the terminal, not the actual runtime, doing the buffering you're experiencing. When submitting input via IO redirect (so no terminal interjection) that terminal-line-buffering is circumvented. – WhozCraig Jan 24 '21 at 10:24
  • `c` should be `int` – M.M Jan 24 '21 at 10:25
  • @M.M yes, but doesn't change the question – mr.loop Jan 24 '21 at 10:26

5 Answers5

3

There are two parts to understanding that code, and there is also an error that chqrlie has made a good argument towards fixing.

Part 0: why you should use int for reading with getchar

As many have commented, using char c is dangerous if you are going to read with getchar, as getchar() returns signed integers, and most notably EOF -- which is generally #defined as -1 to signal end-of-file. Standard char may or may not have a sign - this would make your program unable to recognize -1 / EOF. So let us change the first line to

int c,i; 

Part 1: why is \n special

According to man, getchar() is equivalent to getc(stdin), which is equivalent to fgetc() except that it may be implemented as a macro which evaluates its stream (stdin, in this case) more than once.

Importantly, every time it is called, it consumes a character from its input. Every call to getchar returns the next character from the input, as long as there are characters to return. If none remain, it returns EOF instead.

Now, stdin, the standard input, is generally line-buffered, which means that programs will not have access to the actual characters until lines are terminated with a \n. You can test this with this program:

#include <stdio.h>

int main() {
    int c,i;
    char line[10000];
    i = 0;

    while((c=getchar()) != EOF && c!= 'a') { // <-- replaced `\n` with `a`
        line[i++] = c;
    }

    printf("%s",line);
}

If you run it, it will still not do anything until \n is pressed; but when pressed, the input will finish on the 1st a (not-included). Note that output afterwards will be undefined, since there is no guarantee that there will be a \0 to terminate the string afterwards. To avoid this pitfall, see the rewritten program at the very end.

Part 2: why does the loop condition work as it does

You can rewrite the loop condition as follows. This makes it easier to see what is going on:

// loop condition looks up next char, tests it against EOF and `\n`
while((c=getchar()) != EOF && c!= '\n') { line[i++] = c; }

// loop condition broken up for readability; fully equivalent to above code
while (true) {
   c = getchar();
   if (c == EOF || c == '\n') {
      break; // exit loop
   } else {
      line [i++] = c;
   }
}

Epilogue: improved code

#include <stdio.h>
#define BUFSIZE 10000

int main() {
    char line[BUFSIZE]; // avoid magic number
    int c, i = 0;       // initialize at point of declaration
    
    while (i<BUFSIZE-1              // avoid buffer overflow
         && (c=getchar()) != EOF    // do not read past EOF
         && c!= '\n') {             // do not read past end-of-line
        line[i++] = c;
    }

    line[i++] = 0;      // ensure that the string is null-terminated
    printf("%s",line);
    return 0;           // explicitly return "no error"
}
tucuxi
  • 17,561
  • 2
  • 43
  • 74
  • @chqrlie the question is not "how do I C", but rather "why does this work". My goal is not to rewrite the program to be fully correct, or teach all of C, but rather to explain how one particular line, the one in the question, works. – tucuxi Jan 25 '21 at 09:31
  • @chqrlie you are right that getting the `!= EOF` *is* part of the question. Addressed. – tucuxi Jan 25 '21 at 16:17
  • much better! I would just add a trailing newline in `printf("%s\n", line);` to ensure output is properly displayed on the terminal as `line` does not have a newline. – chqrlie Jan 25 '21 at 17:52
3

The program is incorrect and can invoke undefined behavior.

For starters the variable c shall be declared like

int c;

Otherwise the condition

(c=getchar()) != EOF

can be always true even if the user will try to interrupt the input. The problem is that the macro EOF is a negative integer value of the type int. On the other hand, the type char can behave as the type unsigned char. So the variable c promoted to the type int will always contain a non-negative value.

Secondly the type char in any case can not hold a value equal to 10000 that is the size of the character array. So the variable i should be declared at least as having the type short int.

The while loop shall check whether the current value of the index variable i is already greater than or equal to the size of the character array. Otherwise this statement

    line[i++] = c;

can write beyond the character array.

And at last the result character array line does not contain a string because the terminating zero character '\0' was not appended to the entered sequence of сharacters. As a result this call

printf("%s",line);

invokes undefined behavior.

The program can look the following way

#include <stdio.h>

int main( void ) 
{
    enum { N = 10000 };
    char line[N];

    size_t i = 0;
 
    for ( int c; i + 1 < N && ( c = getchar() ) != EOF && c != '\n'; i++ ) 
    {
        line[i] = c;
    }

    line[i] = '\0';

    puts( line );
}

That is the loop continues to fill the character array until there is enough space in the character array line

i + 1 < N 

the user does not interrupt the input

( c = getchar() ) != EOF

and it does not press the Enter key to finish entering the string

c != '\n'

After the loop the terminating zero is appended

    line[i] = '\0';

Now the array line contains a string that is outputted in the statement

    puts( line );

So for example if the user will type this sequence of characters

Hello world!

and then will pressed the Enter key (that sends the new line character '\n' in the input buffer) then the loop will stop its iteration. The new line character '\n' will not be written in the string. After the loop the terminating zero character '\0' will be appended to the characters stored in the array line.

So the array will contain the following string

{ 'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '!', '\0' }

that is outputted.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
2

Your understanding is basically correct but there are some problems in the code and the input mechanisms are more complex than you infer:

  • c should have type int to accommodate for all the values returned by getc(), namely all values of type unsigned char (in most current systems 0 to 255) and the special negative value EOF (usually -1).
  • i should also have type int, or possibly size_t to index into the line array properly. The posted code with a char type may have undefined behavior if you input a line longer than 127 characters.
  • you should test that i stays within the boundaries of array line. It would take a very long input line, but that is possible and easy to produce by redirecting from a file.
  • line must be null terminated before passing it to printf as an argument for the %s format.

Here is a modified version:

#include <stdio.h>

int main() {
    int c, i;
    char line[10000];

    i = 0;
    while (i < sizeof(line) - 1 && (c = getchar()) != EOF && c != '\n') {
        line[i++] = c;
    }
    line[i] = '\0';   // null terminate the array.

    printf("%s\n", line);
    return 0;
}

Regarding the behavior of the console in response to your program's input requests, it is implementation defined but usually involves 2 layers of buffering:

  • the FILE stream package implements a buffering scheme where data is read from or written to the system in chunks. This buffering can be controlled with setvbuf(). 3 settings are available: no buffering (which is the default for stderr), line buffered (usually the default for stdin and stdout when attached to a character device) and fully buffered with a customisable chunk size (common sizes are 512 and 4096).
  • when you call getchar() or more generally getc(stream), if a byte is available in the stream's buffer, it is returned and the stream position is incremented, otherwise a request is made to the system to fill the buffer.
  • if the stream is attached to a file, filling the buffer performs a read system call or equivalent, which succeeds unless at the end of file or upon a read error.
  • if the stream is attached to a character device, such as a terminal or a virtual tty like a terminal window on the graphics display, another layer of buffering gets involved where the device driver reads input from the input device and handles some keys in a special way such as Backspace to erase the previous character, cursor movement keys to move inside the input line, Ctrl-D (unix) or Ctrl-Z (windows) to signal the end of file. This layer of buffering can be controlled via the tcsetattr() system call or other system specific APIs. Interactive applications such as text editors typically disable this and retrieve raw input directly from the input device.
  • the keys typed by the user are handled by the terminal to form an input line, send back to the C stream API when the user types Enter (which is translated as a system specific end of line sequence), the stream functions perform another set of transformations (ie: converting CR/LF to '\n' on legacy systems) and the line of bytes is stored in the stream buffer. When getc() finally gets a chance to return the first available byte, the full line has already been typed and entered by the user and is pending in the stream or the device buffers.

Investigating this feels like peeling an onion: as you go through the layers of skin, you find more layers to scrape off and it makes you cry :)

chqrlie
  • 131,814
  • 10
  • 121
  • 189
1

Since it is an example from K&R, and since it is not the central issue of your question, let's go over char c that should be int c (because getchar () returns an int). You'll find plenty questions explaining it better.

The while loop behavior is

while (condition_is_true)
    Do_Something;

Your condition contains an assignment, that's always executed:

c=getchar()

It is a part of a logical check (c != EOF) that in your program is always true (you are reading from stdin). So, the condition beyond the && is executed (shortcircuiting makes sure that in a logical and operands are evaluated from left to right until they are true.

The latter condition is c != '\n'. It will be false for all the characters in your "Hello" string, and all of them will be stored in your line array. But as soon as you insert a newline, since the previous assignment put \n into c, the condition becomes false, and the execution exits from the loop (so, the newline won't be stored in line array).

Then, and after then, the string line will be printed.

Roberto Caboni
  • 7,252
  • 10
  • 25
  • 39
  • okay, it is a bit clear. but `the previous assignment put \n into c`, so basically while I writing the c is getting updated to the characters and checked against the conditions, and then it gets reset to H and then loops start running? – mr.loop Jan 24 '21 at 10:24
  • @mr.loop whenever you write something in stdin, `getchar` returns it and it is assigned to `c`. If it is not a newline, the loop executes and it is blocked again at getchar until a new character is inserted. This goes on until the inserted char is a newline. – Roberto Caboni Jan 24 '21 at 10:30
-1

It's because of the implementation of getchar(). This function first lets you write to the buffer until you press the enter key, and then it gets only one char from the buffer.

If you want to get one char directly from the keyboard, you can use the library conio.h.

Have fun learning C and don't be afraid to ask questions!

  • `until you press Enter key`. Where did that came from? because that is not getchar() default behavior, try removing the newline condition. Now, again I was saying that c doesn't have anything when I am writing or at best it has H, then why `c != '\n'` applies. – mr.loop Jan 24 '21 at 10:17
  • 1
    replacing `'\n'` with `'a'` would still yield a working program, which would read up to and not-including the 'a'. Yes, the buffer would only be emptied on newline, but that is not the OPs question. – tucuxi Jan 24 '21 at 10:18