2

I was wondering if the mode (text or binary) that we use for opening a file in C really matters.

For example, in general, we use fread and fwrite for reading and writing in binary mode. Can we use those functions when we open a file in textual mode?

Besides that, in general, we can use fscanf and fprintf for reading and writing in textual mode. Can we use those functions if we open a file in binary mode?

What are the consequences of opening a file in textual or binary modes?

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
Zaratruta
  • 2,097
  • 2
  • 20
  • 26
  • 2
    Most important issue with is that in binary mode any byte is simply read/written as is – while in text mode any line terminators (or at least the system's one, might be implementation dependent) are converted to a single `\n` on reading while on writing any line terminators are converted to the system's line terminator (though, if I recall correctly, there's a facility to change the line terminator, if need be). – Aconcagua Sep 01 '22 at 13:49
  • 2
    You didn't mention your operating system. Just to note, on Linux, the binary flag is ignored. – Jason Sep 01 '22 at 14:09
  • @Jason The C standard allows ignoring the binary flag? In windows this is different? – Zaratruta Sep 01 '22 at 14:14
  • 1
    @Zaratruta that is sort of out of the scope of the standard. The standard only states that the "b" is for binary mode and is acceptable input. How binary vs text mode is actually handled is operating system dependent. On Linux, binary mode == text mode. – Jason Sep 01 '22 at 14:28
  • 2
    @Zaratruta On linux the line terminator is \n, so no action needs to be taken to translate between \n and \n. The two modes are identical on linux, so they don't need to be handled differently. – Avi Berger Sep 01 '22 at 15:16
  • It's important to note that the distinction between text and binary mode is entirely made when you open the file. The choice of actual calls to read and write the file makes no difference. You can use `fread` and `fwrite` perfectly well to read/write text files. You can theoretically use all the rest of the calls on binary files, although really only `getc` and `putc` make sense. (In other words, calling `fread` or `fwrite` on a text stream does **not** cause it to be treated as binary.) – Steve Summit Sep 01 '22 at 15:55
  • @Zaratruta *The C standard allows ignoring the binary flag?* [Yes](http://port70.net/~nsz/c/c11/n1570.html#note266): "An implementation need not distinguish between text streams and binary streams. In such an implementation, there need be no new-line characters in a text stream nor any limit to the length of a line." – Andrew Henle Sep 01 '22 at 16:16
  • *The C standard allows ignoring the binary flag?* Absolutely yes! What would it even mean for the Standard to force an implementation to pay attention to the flag, on a system (such as Unix) where there was no difference to pay attention to? But yes, on Windows there is a distinction, so the flag is paid attention to, and makes a difference. – Steve Summit Sep 01 '22 at 16:38

1 Answers1

6

On POSIX-compliant operating systems (e.g. Linux, MacOS), there is no difference between binary mode and text mode.

However, that is not the case on Microsoft Windows.

Internally, text files on Microsoft Windows normally use \r\n (carriage return followed by line-feed) to terminate a line. If you open a text file in binary mode, you will therefore see \r\n at the end of every line (except maybe the last line). On the other hand, if you open a text file in text mode, the C runtime library will automatically translate all \r\n to \n.

As a consequence of this translation, the effective file size (i.e. the number of times you can call fgetc until you reach the end of the file) is different depending on whether you use text mode or binary mode.

Another consequence of this translation is that you cannot use the function fseek in text mode to jump a certain number of characters in the file. This is only possible in binary mode. The reason for this is that in text mode, the number of bytes between two file positions may be different than in binary mode.

Another issue on Microsoft Windows is that in text mode, the byte value '\x1A' is interpreted as the end of the file. Therefore, if you open a binary file in text mode, then the file may appear a lot shorter than it actually is, if it happens to contain a byte with the value '\x1A'.

However, apart from the issues mentioned above, you can use the functions fread, fwrite, fprintf and fscanf at will in both modes. But the formatted I/O functions fprintf and fscanf will probably be more useful with text files opened in text mode, whereas the unformatted I/O functions fread and fwrite will probably be more useful with binary files opened in binary mode.

Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39
  • Does POSIX-compliance also nullify the differences in behaviour (between binary and text modes) for [fseek](https://en.cppreference.com/w/c/io/fseek) and [ftell](https://en.cppreference.com/w/c/io/ftell)? – Adrian Mole Sep 01 '22 at 15:40
  • @AdrianMole If you mean, does POSIX guarantee you can treat `fseek`/`ftell` offset values as pure and useful byte counts, I believe the answer is "yes". – Steve Summit Sep 01 '22 at 15:57
  • 1
    @AdrianMole: I believe it does. The [POSIX definition of fseek](https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/functions/fseek.html#) does not seem to distinguish between text streams and binary streams, whereas the [ISO C definition of fseek](http://port70.net/~nsz/c/c11/n1570.html#7.21.9.2) does. – Andreas Wenzel Sep 01 '22 at 16:02
  • @AndreasWenzel Nevermind [the return value from `ftell()` isn't a byte offset for a text stream per the C standard](http://port70.net/~nsz/c/c11/n1570.html#7.21.9.4p2): "... For a text stream, its file position indicator contains unspecified information, usable by the `fseek` function for returning the file position indicator for the stream to its position at the time of the `ftell` call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read." – Andrew Henle Sep 01 '22 at 16:45