66

The newline character is represented by "\n" in C code. Is there an equivalent for the end-of-file (EOF) character?

static_rtti
  • 53,760
  • 47
  • 136
  • 192
  • 21
    The question presumes incorrectly that "EOF" is a character, when in fact it is a *condition*. In fact, if it **were** a character, it wouldn't be the end anymore, would it? – Kerrek SB Sep 12 '12 at 13:43
  • 2
    There is no EOF character. EOF is an *out-of-bounds* value used to indicate an EOF condition. It is not equal to any character value (as read by getc() et.al.) – wildplasser Sep 12 '12 at 13:43
  • 5
    @Kerrek SB: you are correct, but note that some operating systems in the past did actually have an EOF character which was embedded in the file, e.g. CP/M used Control-Z for this. – Paul R Sep 12 '12 at 13:49
  • 3
    Questions answered like "the question is "too obvious" are not as helpful as answers that show kindness and give guidance. This question about EOF and SOF vexed me until I dug into it. Here is a good article that discusses this exact point and answers it in more detail with code examples... https://ruslanspivak.com/eofnotchar/ – Rich Lysakowski PhD Jan 12 '22 at 17:35

11 Answers11

106

EOF is not a character (in most modern operating systems). It is simply a condition that applies to a file stream when the end of the stream is reached. The confusion arises because a user may signal EOF for console input by typing a special character (e.g Control-D in Unix, Linux, et al), but this character is not seen by the running program, it is caught by the operating system which in turn signals EOF to the process.

Note: in some very old operating systems EOF was a character, e.g. Control-Z in CP/M, but this was a crude hack to avoid the overhead of maintaining actual file lengths in file system directories.

Paul R
  • 208,748
  • 37
  • 389
  • 560
  • 3
    [The C standard does not guarantee that EOF is not a character.](http://stackoverflow.com/a/3861506/298225) – Eric Postpischil Sep 12 '12 at 14:58
  • 3
    @EricPostpischil: the C standard does (indirectly) guarantee that the return value from `getchar()` et al is either a valid character or a distinct code, EOF, that is not the code for a valid character. _`EOF` which expands to an integer constant expression, with type `int` and a negative value, that is returned by several functions to indicate end-of-file, that is, no more input from a stream;_' and '_the `fgetc` function obtains [the next] character as an `unsigned char` converted to an `int`_'. On any system where `sizeof(char) != sizeof(int)`, therefore, EOF is distinct from any `char`. – Jonathan Leffler Sep 12 '12 at 15:42
  • The text you quote does not indicate that EOF must be different from any character value. It is common that EOF is not equal to any character value, but it is not guaranteed by the C standard. – Eric Postpischil Sep 12 '12 at 16:00
  • 11
    Also note that even today in Windows, Ctrl-Z in a file will trigger an EOF condition if it's opened in text mode. Microsoft takes their backwards compatibility with CP/M very seriously. – Michael Burr Sep 13 '12 at 05:50
  • 1
    @Michael Burr: gosh, I didn't know that - so in some ways we really haven't come all that far from the CP/M era. – Paul R Sep 13 '12 at 06:08
  • 2
    @MichaelBurr: Are you sure that's Windows and not the compiler-specific stdio implementation? AFAIK, Windows doesn't even have a "opened in text mode" condition. – Ben Voigt Aug 02 '14 at 05:23
  • @BenVoigt - The EOF control character (ASCII char 0x1a) is still treated and end-of-file by built-in Windows command line utilities. For instance the `copy` command, when used with the /a (ASCII mode) option, will append the EOF character code at the end of files that get appended. Similarly, the `type` command obeys EOF characters that occur in text files. Is a holdover from MS-DOS. – vercellop May 22 '18 at 01:25
  • 2
    @vercellop: Yes, the command interpreter has a lot of DOS backward compatibility. But while it is bundled with Windows, it's just a user-mode tool, not part of the OS. – Ben Voigt May 22 '18 at 01:28
17

EOF is not a character. It can't be: A (binary) file can contain any character. Assume you have a file with ever-increasing bytes, going 0 1 2 3 ... 255 and once again 0 1 ... 255, for a total of 512 bytes. Whichever one of those 256 possible bytes you deem EOF, the file will be cut short.

That's why getchar() et al. return an int. The range of possible return values are those that a char can have, plus a genuine int value EOF (defined in stdio.h). That's also why converting the return value to a char before checking for EOF will not work.

Note that some protocols have "EOF" "characters." ASCII has "End of Text", "End of Transmission", "End of Transmission Block" and "End of Medium". Other answers have mentioned old OS'es. I myself input ^D on Linux and ^Z on Windows consoles to stop giving programs input. (But files read via pipes can have ^D and ^Z characters anywhere and only signal EOF when they run out of bytes.) C strings are terminated with the '\0' character, but that also means they cannot contain the character '\0'. That's why all C non-string data functions work using a char array (to contain the data) and a size_t (to know where the data ends).

Edit: The C99 standard §7.19.1.3 states:

The macros are [...]
EOF
which expands to an integer constant expression, with type int and a negative value, that is returned by several functions to indicate end-of-file, that is, no more input from a stream;

aib
  • 45,516
  • 10
  • 73
  • 79
  • [The C standard does not guarantee that EOF is not a character.](http://stackoverflow.com/a/3861506/298225) – Eric Postpischil Sep 12 '12 at 14:59
  • 1
    Your edit does not show that EOF does not equal a character value. The fact that EOF indicates end-of-file does not preclude it from equalling a char value. The fact that EOF is negative does not preclude it from equalling a char value. (Allowing EOF to be a character value is a nuisance but, as the answer I linked to states, does not preclude a C implementation from conforming to the C standard.) – Eric Postpischil Sep 12 '12 at 16:02
  • 1
    That doesn't change the problem. People doing `((charVar = getchar()) == EOF)` will see incorrect behavior. What you're saying is that they may get a premature, false EOF when they read that `char` value which happens to equal `EOF` when promoted to `int`, instead of looping forever because no `char` will ever equal `EOF`. The solution is still the same: `((intVar = getchar()) == EOF)` – aib Sep 13 '12 at 00:32
  • You should've said "The C standard does not guarantee that `EOF` does not equal a `char` value." Indeed, even if an implementation uses the same type of `char` and `int`, they are still distinct types for the standard and those conforming to it. – aib Sep 13 '12 at 00:41
  • @EricPostpischil So that means that all these 3 most upvoted answers are fundamentally wrong? They all 3 start saying variations of "EOF is not a character". Do you know another source with the correct answer then? – Santropedro Jul 20 '19 at 00:48
  • 1
    @Santropedro: Yes, the answers are wrong. Various standard library routines return a character as an `unsigned char` converted to an `int`, so that must have a non-negative value, which cannot equal `EOF` because `EOF` is negative. However, one of the definitions of “character” in the C standard is “bit representation that fits in a byte.” And many people handle characters using the `char` type, which may be signed. (In fact, `fgets` takes a `char *`.) Then it may be possible to have a `char x` whose value equals `EOF` but which could be validly printed with `fputc` and other functions. – Eric Postpischil Jul 20 '19 at 01:07
  • 1
    @Santropedro: What this means, to correctly answer the question, is that one should detect `EOF` by using the return value from functions such as `fgetc`, which returns either a character as an `unsigned char` converted to `int` or `EOF`. That will work in all but the exotic hypothetical C implementations discussed in the link I provided. (To write code even for those implementations, use the `feof` function.) But one should not assume that a `char` value does not equal `EOF`. – Eric Postpischil Jul 20 '19 at 01:15
11

No. EOF is not a character, but a state of the filehandle.

While there are there are control characters in the ASCII charset that represents the end of the data, these are not used to signal the end of files in general. For example EOT (^D) which in some cases almost signals the same.

When the standard C library uses signed integer to return characters and uses -1 for end of file, this is actually just the signal to indicate than an error happened. I don't have the C standard available, but to quote SUSv3:

If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream shall be set and fgetc() shall return EOF. If a read error occurs, the error indicator for the stream shall be set, fgetc() shall return EOF, and shall set errno to indicate the error.

pmakholm
  • 1,488
  • 8
  • 23
6

I've read all the comments. It's interesting to notice what happens when you print out this:

printf("\nInteger =    %d\n", EOF);             //OUTPUT = -1
printf("Decimal =    %d\n", EOF);               //OUTPUT = -1
printf("Octal =  %o\n", EOF);                   //OUTPUT = 37777777777
printf("Hexadecimal =  %x\n", EOF);             //OUTPUT = ffffffff
printf("Double and float =  %f\n", EOF);        //OUTPUT = 0.000000
printf("Long double =  %Lf\n", EOF);            //OUTPUT = 0.000000
printf("Character =  %c\n", EOF);               //OUTPUT = nothing

As we can see here, EOF is NOT a character (whatsoever).

carloswm85
  • 1,396
  • 13
  • 23
  • You're getting UB because you're using the wrong format specifier. `EOF` is not a float, double or long double so obviously printing it as a floating-point type doesn't work – phuclv Aug 22 '20 at 16:07
  • @phuclv Can you tell me what's UB? – carloswm85 Sep 29 '20 at 00:43
  • 1
    undefined behavior [What happens when I use the wrong format specifier?](https://stackoverflow.com/q/16864552/995714) – phuclv Sep 29 '20 at 01:21
  • 1
    undefined behaviour means that it's not defined in the C standard but not that there is never a reason for the behaviour. When you print the `double` value, the library function reads 8 bytes from the stack, the last 4 of which are the `0xFFFFFFFF` you can see from `%x`, and it interprets those 8 bytes as a `double`. It's most likely seeing a very small non-zero denormalised value that prints as 0.0000000 because there are only 6 decimal places.The other 4 bytes are probably `0x00` here but they could be *anything*; hence "undefined behaviour" and you might see other random nonsense. – szmoore Aug 27 '21 at 04:51
3

The EOF character recognized by the command interpreter on Windows (and MSDOS, and CP/M) is 0x1a (decimal 26, aka Ctrl+Z aka SUB)

It can still be be used today for example to mark the end of a human-readable header in a binary file: if the file begins with "Some description\x1a" the user can dump the file content to the console using the TYPE command and the dump will stop at the EOF character, i.e. print Some description and stop, instead of continuing with the garbage that follows.

phuclv
  • 37,963
  • 15
  • 156
  • 475
Axel Rietschin
  • 611
  • 1
  • 7
  • 10
1

This is system dependent but often -1. See here

onoma
  • 93
  • 2
  • 11
1

I think it may vary from system to system but one way of checking would be to just use printf

#include <stdio.h>
int main(void)
{
    printf("%d", EOF);
    return 0;
}

I did this on Windows and -1 was printed to the console. Hope this helps.

Keith Miller
  • 1,337
  • 1
  • 16
  • 32
1

The value of EOF can't be confused with any real character.

If a= getchar(), then we must declare a big enough to hold any value that getchar() returns. We can't use char since a must be big enough to hold EOF in addition to characters.

Luke Taylor
  • 8,631
  • 8
  • 54
  • 92
Harsh Vardhan
  • 111
  • 1
  • 4
  • This answer is ambiguous. While the first part is correct, the second part that describes the size of `a` is hard to understand. I edited your post to allow for a little more clarity. – Luke Taylor Mar 27 '16 at 13:40
1

The answer is NO, but...

You may confused because of the behavior of fgets()

From http://www.cplusplus.com/reference/cstdio/fgets/ :

Reads characters from stream and stores them as a C string into str until (num-1) characters have been read or either a newline or the end-of-file is reached, whichever happens first.

betontalpfa
  • 3,454
  • 1
  • 33
  • 65
1

I have been researching a lot about the EOF signal. In the book on Programming in C by Dennis Ritchie it is first encountered while introducing putchar() and getchar() commands. It basically marks the end of the character string input.

For eg. Let us write a program that seeks two numerical inputs and prints their sum. You'll notice after each numerical input you press Enter to mark the signal that you have completed the iput action. But while working with character strings Enter is read as just another character ['\n': newline character]. To mark the termination of input you enter ^Z(Ctrl + Z on keyboard) in a completely new line and then enter. That signals the next lines of command to get executed.

#include <stdio.h>

int main()
{
char c;
int i = 0;
printf("INPUT:\t");
c = getchar();

while (c != EOF)
{
   ++i;
   c = getchar();
   
};

printf("NUMBER OF CHARACTERS %d.", i);

return 0;}

The above is the code to count number of characters including '\n'(newline) and '\t'( space) characters. If you don't wanna count the newline characters do this :

#include <stdio.h>

int main()
{
char c;
int i = 0;
printf("INPUT:\t");
c = getchar();

while (c != EOF)
{
    if (c != '\n')
    {
        ++i;
    }

    c = getchar();
    };

printf("NUMBER OF CHARACTERS %d.", i);

return 0;}. 

NOW THE MAIN THINK HOOW TO GIVE INPUT. IT'S SIMPLE: Write all the story you want then go in a new line and enter ^Z and then enter again.

0

There is the constant EOF of type int, found in stdio.h. There is no equivalent character literal specified by any standard.

Lundin
  • 195,001
  • 40
  • 254
  • 396