25

I have managed this far with the knowledge that EOF is a special character inserted automatically at the end of a text file to indicate its end. But I now feel the need for some more clarification on this. I checked on Google and the Wikipedia page for EOF but they couldn't answer the following, and there are no exact Stack Overflow links for this either. So please help me on this:

  • My book says that binary mode files keep track of the end of file from the number of characters present in the directory entry of the file. (In contrast to text files which have a special EOF character to mark the end). So what is the story of EOF in context of binary files? I am confused because in the following program I successfully use !=EOF comparison while reading from an .exe file in binary mode:

     #include<stdio.h>
     #include<stdlib.h>
    
     int main()
     {
    
      int ch;   
      FILE *fp1,*fp2;
    
      fp1=fopen("source.exe","rb");
      fp2=fopen("dest.exe","wb");
    
      if(fp1==NULL||fp2==NULL)
      {
      printf("Error opening files");
      exit(-1);
      }
    
      while((ch=getc(fp1))!=EOF)
      putc(ch,fp2);
    
      fclose(fp1);
      fclose(fp2);
    
      }
    
  • Is EOF a special "character" at all? Or is it a condition as Wikipedia says, a condition where the computer knows when to return a particular value like -1 (EOF on my computer)? Example of such "condition" being when a character-reading function finishes reading all characters present, or when character/string I/O functions encounter an error in reading/writing?

    Interestingly, the Stack Overflow tag for EOF blended both those definitions of the EOF. The tag for EOF said "In programming realm, EOF is a sequence of byte (or a chacracter) which indicates that there are no more contents after this.", while it also said in the "about" section that "End of file (commonly abbreviated EOF) is a condition in a computer operating system where no more data can be read from a data source. The data source is usually called a file or stream."

But I have a strong feeling EOF won't be a character as every other function seems to be returning it when it encounters an error during I/O.

It will be really nice of you if you can clear the matter for me.

Mat
  • 202,337
  • 40
  • 393
  • 406
Thokchom
  • 1,602
  • 3
  • 17
  • 32

5 Answers5

31

The various EOF indicators that C provides to you do not necessarily have anything to do with how the file system marks the end of a file.

Most modern file systems know the length of a file because they record it somewhere, separately from the contents of the file. The routines that read the file keep track of where you are reading and they stop when you reach the end. The C library routines generate an EOF value to return to you; they are not returning a value that is actually in the file.

Note that the EOF returned by C library routines is not actually a character. The C library routines generally return an int, and that int is either a character value or an EOF. E.g., in one implementation, the characters might have values from 0 to 255, and EOF might have the value −1. When the library routine encountered the end of the file, it did not actually see a −1 character, because there is no such character. Instead, it was told by the underlying system routine that the end of file had been reached, and it responded by returning −1 to you.

Old and crude file systems might have a value in the file that marks the end of file. For various reasons, this is usually undesirable. In its simplest implementation, it makes it impossible to store arbitrary data in the file, because you cannot store the end-of-file marker as data. One could, however, have an implementation in which the raw data in the file contains something that indicates the end of file, but data is transformed when reading or writing so that arbitrary data can be stored. (E.g., by “quoting” the end-of-file marker.)

In certain cases, things like end-of-file markers also appear in streams. This is common when reading from the terminal (or a pseudo-terminal or terminal-like device). On Windows, pressing control-Z is an indication that the user is done entering input, and it is treated similarly to reach an end-of-file. This does not mean that control-Z is an EOF. The software reading from the terminal sees control-Z, treats it as end-of-file, and returns end-of-file indications, which are likely different from control-Z. On Unix, control-D is commonly a similar sentinel marking the end of input.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
2

This should clear it up nicely for you.

Basically, EOF is just a macro with a pre-defined value representing the error code from I/O functions indicating that there is no more data to be read.

Christopher Neylan
  • 8,018
  • 3
  • 38
  • 51
  • Thanks Christopher.That link you gave is from one of my favorite sites.Good Ol **Alex Allain**!!I had checked the **C** section,but never knew there's a cool FAQ part as well. – Thokchom May 21 '13 at 19:32
1

The file doesn't actually contain an EOF. EOF isn't a character of sorts - remember a byte can be between 0 and 255, so it wouldn't make sense if a file could contain a -1. The EOF is a signal from the operating system that you're using, which indicates the end of the file has been reached. Notice how getc() returns an int - that is so it can return that -1 to tell you the stream has reached the end of the file.

The EOF signal is treated the same for binary and text files - the actual definition of binary and text stream varies between the OSes (for example on *nix binary and text mode are the same thing.) Either way, as stated above, it is not part of the file itself. The OS passes it to getc() to tell the program that the end of the stream has been reached.

From From the GNU C library:

This macro is an integer value that is returned by a number of narrow stream functions to indicate an end-of-file condition, or some other error situation. With the GNU C Library, EOF is -1. In other libraries, its value may be some other negative number.

0

EOF is not a character. In this context, it's -1, which, technically, isn't a character (if you wanted to be extremely precise, it could be argued that it could be a character, but that's irrelevant in this discussion). EOF, just to be clear is "End of File". While you're reading a file, you need to know when to stop, otherwise a number of things could happen depending on the environment if you try to read past the end of the file.

So, a macro was devised to signal that End of File has been reached in the course of reading a file, which is EOF. For getc this works because it returns an int rather than a char, so there's extra room to return something other than a char to signal EOF. Other I/O calls may signal EOF differently, such as by throwing an exception.

As a point of interest, in DOS (and maybe still on Windows?) an actual, physical character ^Z was placed at the end of a file to signal its end. So, on DOS, there actually was an EOF character. Unix never had such a thing.

Eric
  • 843
  • 6
  • 15
-1

Well it is pretty much possible to find the EOF of a binary file if you study it's structure.

No, you don't need the OS to know the EOF of an executable EOF.

Almost every type of executable has a Page Zero which describes the basic information that the OS might need while loading the code into the memory and is stored as the first page of that executable.

Let's take the example of an MZ executable. https://wiki.osdev.org/MZ

Here at offset 2, we have the total number of complete/partial pages and right after that at offset 4 we have the number of bytes in the last page. This information is generally used by the OS to safely load the code into the memory, but you can use it to calculate the EOF of your binary file.

Algorithm:

 1. Start
 2. Parse the parameter and instantiate the file pointer as per your requirement.
 3. Load the first page (zero) in a (char) buffer of default size of page zero and print it. 
 4. Get the value at *((short int*)(&buffer+2)) and store it in a loop variable called (short int) i.
 5. Get the value at *((short int*)(&buffer+4)) and store it in a variable called (short int) l.
 6. i--
 7. Load and print (or do whatever you wanted to do) 'size of page' characters into a buffer until i equals zero.
 8. Once the loop has finished executing just load `l` bytes into that buffer and again perform whatever you wanted to 
 9.  Stop

If you're designing your own binary file format then consider adding some sort of meta data at the start of that file or a special character or word that denotes the end of that file.

And there's a good amount of probability that the OS loads the size of the file from here with the help of simple maths and by analyzing the meta-data even though it might seem that the OS has stored it somewhere along with other information it's expected to store (Abstraction to reduce redundancy).

Project Zero
  • 63
  • 10