26

I am trying to learn C on my own and I'm kind of confused with getchar and putchar:

1

#include <stdio.h>

int main(void)
{
    char c;
    printf("Enter characters : ");
    while((c = getchar()) != EOF){
      putchar(c);
    }
    return 0;
}

2

#include <stdio.h>

int main(void)
{
    int c;
    printf("Enter characters : ");
    while((c = getchar()) != EOF){
      putchar(c);
    }
    return 0;
}

The C library function int putchar(int c) writes a character (an unsigned char) specified by the argument char to stdout.

The C library function int getchar(void) gets a character (an unsigned char) from stdin. This is equivalent to getc with stdin as its argument.

Does it mean putchar() accepts both int and char or either of them and for getchar() should we use an int or char?

Community
  • 1
  • 1
ragmha
  • 631
  • 7
  • 19
  • 1
    [Why must the variable used to hold getchar's return value be declared as int?](http://stackoverflow.com/q/18013167/995714) – phuclv May 15 '17 at 12:15
  • 1
    Possible duplicate of [Why must the variable used to hold getchar's return value be declared as int?](http://stackoverflow.com/questions/18013167/why-must-the-variable-used-to-hold-getchars-return-value-be-declared-as-int) – phuclv May 15 '17 at 12:15
  • @LưuVĩnhPhúc or the opposite way. The age of a question doesn't matter – Antti Haapala -- Слава Україні May 19 '17 at 10:05

2 Answers2

53

TL;DR:

  • char c; c = getchar(); is wrong, broken and buggy.
  • int c; c = getchar(); is correct.

This applies to getc and fgetc as well, if not even more so, because one would often read until the end of the file.


Always store the return value of getchar (fgetc, getc...) (and putchar) initially into a variable of type int.

The argument to putchar can be any of int, char, signed char or unsigned char; its type doesn't matter, and all of them work the same, even though one might result in positive and other in negative integers being passed for characters above and including \200 (128).


The reason why you must use int to store the return value of both getchar and putchar is that when the end-of-file condition is reached (or an I/O error occurs), both of them return the value of the macro EOF which is a negative integer constant, (usually -1).

For getchar, if the return value is not EOF, it is the read unsigned char zero-extended to an int. That is, assuming 8-bit characters, the values returned can be 0...255 or the value of the macro EOF; again assuming 8-bit char, there is no way to squeeze these 257 distinct values into 256 so that each of them could be identified uniquely.


Now, if you stored it into char instead, the effect would depend on whether the character type is signed or unsigned by default! This varies from compiler to compiler, architecture to architecture. If char is signed and assuming EOF is defined as -1, then both EOF and character '\377' on input would compare equal to EOF; they'd be sign-extended to (int)-1.

On the other hand, if char is unsigned (as it is by default on ARM processors, including Raspberry PI systems; and seems to be true for AIX too), there is no value that could be stored in c that would compare equal to -1; including EOF; instead of breaking out on EOF, your code would output a single \377 character.

The danger here is that with signed chars the code seems to be working correctly even though it is still horribly broken - one of the legal input values is interpreted as EOF. Furthermore, C89, C99, C11 does not mandate a value for EOF; it only says that EOF is a negative integer constant; thus instead of -1 it could as well be say -224 on a particular implementation, which would cause spaces behave like EOF.

gcc has the switch -funsigned-char which can be used to make the char unsigned on those platforms where it defaults to signed:

% cat test.c
#include <stdio.h>

int main(void)
{
    char c;
    printf("Enter characters : ");
    while ((c = getchar()) != EOF){
      putchar(c);
    }
    return 0;
}

Now we run it with signed char:

% gcc test.c && ./a.out
Enter characters : sfdasadfdsaf
sfdasadfdsaf
^D
%

Seems to be working right. But with unsigned char:

% gcc test.c -funsigned-char && ./a.out                   
Enter characters : Hello world
Hello world
���������������������������^C
%

That is, I tried to press Ctrl-D there many times but a was printed for each EOF instead of breaking the loop.

Now, again, for the signed char case, it cannot distinguish between char 255 and EOF on Linux, breaking it for binary data and such:

% gcc test.c && echo -e 'Hello world\0377And some more' | ./a.out 
Enter characters : Hello world
%

Only the first part up to the \0377 escape was written to stdout.


Beware that comparisons between character constants and an int containing the unsigned character value might not work as expected (e.g. the character constant 'ä' in ISO 8859-1 would mean the signed value -28. So assuming that you write code that would read input until 'ä' in ISO 8859-1 codepage, you'd do

int c;
while ((c = getchar()) != EOF){
    if (c == (unsigned char)'ä') {
        /* ... */
    }
}

Due to integer promotion, all char values fit into an int, and are automatically promoted on function calls, thus you can give any of int, char, signed char or unsigned char to putchar as an argument (not to store its return value), and it would work as expected.

The actual value passed in the integer might be positive or even negative; for example the character constant \377 would be negative on a 8-bit-char system where char is signed; however putchar (or fputc actually) will convert the value to an unsigned char. C11 7.21.7.3p2:

2 The fputc function writes the character specified by c (converted to an unsigned char) to the output stream pointed to by stream [...]

(emphasis mine)

I.e. the fputc will be guaranteed to convert the given c as if by (unsigned char)c

  • I don't understand, yet, why is the code horrible broken with signed char. After all, doesn't it stores bits? Just like int? I hope to understand what is this danger. After all, if you store the binary value 10 in a char or int variable, it is exactly the same binary number. Therefore, same integer number. – Judismar Arpini Junior Feb 12 '16 at 07:14
  • @DavidSchwartz: You read the value of `getchar()` into an `int`, using, for example, `int c = getchar();`. Now, if you read an ordinary character, then the value in `c` is non-negative and between 0 and UCHAR_MAX. If you got EOF, then the value in `c` is negative (usually `-1` though the standard does not mandate that value — EOF must be negative, that's all). So you can tell the difference easily — when you read the value correctly. _[…continued…]_ – Jonathan Leffler Feb 12 '16 at 07:21
  • _[…continuation…]_ If you store the value from `getchar()` directly into a signed `char` (whether that's explicitly `signed char` or plain `char` on a machine where it is a signed type), then you have one valid character value that can be confused with EOF. Often, that character code is 0xFF, which maps to -1 when the machine uses two's complement arithmetic and 8-bit `char` and EOF is `-1`. The converse problem occurs if plain `char` is an unsigned type; then storing `EOF` directly is never equal to EOF — given `unsigned char c;`, testing `if ((c = getchar()) == EOF)` is never true. – Jonathan Leffler Feb 12 '16 at 07:24
  • @JonathanLeffler, I understand everything you wrote. What I don't get is the danger, since signed char doesn't have a problem at all with EOF being negative and -1. I just feel like saying this: int will have the exact same problem in case it is **unsigned int** . So what's the difference? – Judismar Arpini Junior Feb 12 '16 at 07:31
  • 3
    Especially if you live in Turkey, where the letter ÿ (y-umlaut, U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS) is used, then typing that letter into code that saves the result of `getchar()` into a signed `char` type would be detected as EOF, just as if you'd typed Control-D (Unix) or Control-Z (Windows) — those indicate 'no more data' or EOF. So, the problem is that a legitimate character (ÿ) is treated as EOF when it should not be. It's almost as bad as never treating anything as EOF. – Jonathan Leffler Feb 12 '16 at 07:55
  • 1
    The standard specifies (for [`fgetc()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/fgetc.html) but `getchar()` is implemented in terms of `getc(stdin)` and `getc()` is equivalent to `fgetc()`) that: _If the end-of-file indicator for the input stream pointed to by `stream` is not set and a next character is present, the `fgetc` function obtains that character as an `unsigned char` converted to an `int` … If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the `fgetc` function returns EOF._ – Jonathan Leffler Feb 12 '16 at 07:58
  • 1
    The concern about `'ä'` is new to me. It appears that C11 §6.4.4.4 10 is the relevant citation: "If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type `char` whose value is that of the single character or escape sequence is converted to type `int`." – chux - Reinstate Monica Apr 14 '17 at 15:21
  • 1
    I had an argument with someone, about whether or not you need to use `int` vs `char` with `getchar()`, in some comments a while back. Next time I will just point them here! – ad absurdum Jun 14 '17 at 21:35
2

Always use int to save character from getchar() as EOF constant is of int type. If you use char then the comparison against EOF is not correct.

You can safely pass char to putchar() though as it will be promoted to int automatically.

Note: Technically using char will work in most cases, but then you can't have 0xFF character as they will be interpreted as EOF due to type conversion. To cover all cases always use int. As @Ilja put it -- int is needed to represent all 256 possible character values and the EOF, which is 257 possible values in total, which cannot be stored in char type.

Am_I_Helpful
  • 18,735
  • 7
  • 49
  • 73
JohnLM
  • 195
  • 3
  • 8
  • " If you use char then the comparison against EOF is not correct." Not sure if I get it. A char variable stores an integer number, so it's just the same, in this case, as using int. – Judismar Arpini Junior Feb 12 '16 at 07:00
  • `EOF` is (int)-1, which is out of range for `char` type. – JohnLM Feb 12 '16 at 07:05
  • Signed char type range is -128, 127. – Judismar Arpini Junior Feb 12 '16 at 07:11
  • on a 32-bit machine `(int)-1` is `0xFFFFFFFF` which is out of range for a `char`, but `(signed char)-1` which is `0xFF` will still get type promoted to `int` during comparison. That's reason why it _usually_ works, but you can't have `0xFF` as valid characters in the stream if you're not using `int` to store the character. I.e., using `int` to store it will be saved as `0x000000FF` which is distinct from `EOF`. – JohnLM Feb 12 '16 at 07:16
  • "Technically using char will work in most cases" is wrong, standard does not mandate that `char` is signed. – Antti Haapala -- Слава Україні Feb 12 '16 at 07:27
  • It will work in _most_ cases. I'm not telling it's safe to do so. – JohnLM Feb 12 '16 at 07:28
  • 1
    It should say "use int if you expect 0xFF in streams or if you do not know whether `char` is signed or unsigned", that is to say, "always". – Antti Haapala -- Слава Україні Feb 12 '16 at 07:29
  • (not that the `EOF` couldn't be defined something else than `-1`). – Antti Haapala -- Слава Україні Feb 12 '16 at 07:41
  • True on both accounts, but I prefer to keep the answer terse. – JohnLM Feb 12 '16 at 07:50
  • So you're telling me it's better to read a char value and store it in a int variable? All you have to do is use signed char if it's required so. @AnttiHaapala, you still failed to prove your point in your own answer, and I asked for it many times, in order to understand my mistake. What I concluded is: you prefer to defend your habit of programming with fallacies. – Judismar Arpini Junior Feb 12 '16 at 16:25
  • 3
    @JudismarJunior "So you're telling me it's better to read a char value and store it in a int variable?" Yes! Well, at least until you compare it against `EOF`. After that, you're good to store it as `char`. But I'm also saying `getchar()` returns `int`, and for a reason. – JohnLM Feb 12 '16 at 17:45
  • @JohnLM, I agree it returns *int*. I was trying to understand what was the problem you guys were talking about, but I decided to leave this behind. I am unable to understand, I suppose. I know that comparing a *char* with *int* never had any problem, due to implicit casting (just like comparing int to float). – Judismar Arpini Junior Feb 12 '16 at 20:24
  • @JudismarJunior: I suggest you try to grasp this concept. Otherwise your C programs will be buggy, buggy, buggy. – Mikko Ohtamaa Feb 12 '16 at 23:02
  • @JohnLM `(int)-1` is in range for signed char. C uses value semantics. The probem is that it is not distinguishable from `(char)-1` when both have been assigned to a variable of the same type. – M.M Feb 12 '16 at 23:17
  • 3
    @JudismarJunior the accepted answer has a great way of putting this: assuming 8-bit chars you are trying to represent 257 symbols with a type able to represent only 256 symbols. You need 256 chars + EOF. Int can represent those. – Ilja Everilä Feb 13 '16 at 10:08
  • @Ilja, I don't understand: 256 chars + EOF? Did you mean 256 symbols? I still can't understand, given EOF is (int)-1. but I appreciate your comment. I'll try to read the whole answer again. Thanks! – Judismar Arpini Junior Feb 13 '16 at 13:28
  • 1
    @Ilja, I've got it now. Thanks again. – Judismar Arpini Junior Feb 13 '16 at 13:37