3

I'm new to C sorry if my question sounds dumb, we know that EOF is a constant equal to -1, so let's say I create a new text file and I typed something like abcd and then save the file. my questions are:

Q1-the text editor append EOF to the content of the file as abcd-1, is it correct?

Q2-What happen if the content of the file is ab-1cd? then isn't the library reading function will think the content of the file is ab and ingore the rest of the content?

  • 1
    See [C11 Standard - 7.21.1 Input/output (p3)](http://port70.net/~nsz/c/c11/n1570.html#7.21.1p3) for the formal `EOF` definition. Generally represented by a macro that evaluates to `-1` but the actual negative value isn't specified by the standard. It `"is returned by several functions to indicate end-of-file, that is, no more input from a stream"` – David C. Rankin Jul 21 '20 at 01:32

1 Answers1

10

Q1-the text editor append EOF to the content of the file as abcd-1, is it correct?

Incorrect. EOF is not something that is stored in your file. It is a C-language construct to indicate that all file content has been read, and that the stream is at the end-of-file.

Q2-What happen if the content of the file is ab-1cd?

Irrelevant. There is no EOF character that can be inserted into the file stream.

A file is almost always represented as a sequence of bytes, where a byte is an 8-bit unit which can represent values from 0 (0x00) to 255 (0xFF). This is what we call raw or binary data. Those values are assigned meaning according to the encoding of the file.

For example the ASCII encoding indicates that the value 65 (0x41) represents the character A, 66 B, and so on. The ASCII character set does have a number of control codes like 3 ETX (end-of-text), but these are obsolete and have no practical modern meaning.

A file stored in a filesystem has an intrinsic length which indicates the number of bytes in the file. Thus, the "end of file" occurs after the last byte, indicated by that length.

Interestingly (from Wikipedia):

Some operating systems such as CP/M tracked file length only in units of disk blocks and used Control-Z to mark the end of the actual text in the file.


So where does EOF come into play?

EOF is a construct, a constant defined to -1, which is used by the <stdio.h> API (e.g. fread) to indicate that the end-of-file has been reached. You should remember though, that fread is an abstraction over a lower-level system call interface (e.g. read).

On success, the number of bytes read is returned (zero indicates end of file)

Let's consider an ASCII file of size 3 that has the content ABC. In a hex editor, it would look like this:

0000    41 42 43                        ABC

Now, we run the following code:

#include <unistd.h>
#include <fnctl.h>

int main(void)
{
    int fd = open("ourfile.txt", O_RDONLY);

    char c;

    read(fd, &c, 1);    // returns 1, c gets 'A'
    read(fd, &c, 1);    // returns 1, c gets 'B'
    read(fd, &c, 1);    // returns 1, c gets 'C'
    read(fd, &c, 1);    // returns 0, c is unmodified
}

So you see that end-of-file is a state that is indicated, and not an actual data value.

Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328
  • Re "*Some operating systems [...] used Control-Z to mark the end of the actual text in the file.*", DOS inherited a bit of that in-band Ctrl-Z signaling. I don't know if it was the OS or apps or a mix of both, but text after a chr(26) could be ignored when using text mode. In modern unixy systems, there's no difference between text and binary files opening modes (and that may always have been the case), but even to this day there's a difference in Windows. In controls whether CRLF<->LF translation is done, but Ctrl-Z is not treated specially for text files anymore. – ikegami Jul 21 '20 at 02:11
  • @ikegami Good point about Ctrl-Z on windows. There are vestiges that still cause problems. For example, see [this question about a .NET SerialPort problem](https://stackoverflow.com/questions/12483711/serialdata-eof-circumstances) stemming from Ctrl-Z. – Jonathon Reinhart Jul 21 '20 at 11:00
  • For the most part though, in-band signaling of EOF is archaic, and file system size is the real thing that controls where the end of the file is. In a modern OS, even if there were a `1A` byte in the stream, you could read past it (even if it did signal something). – Jonathon Reinhart Jul 21 '20 at 11:03