64

What translation occurs when writing to a file that was opened in text mode that does not occur in binary mode? Specifically in MS Visual C.

unsigned char buffer[256];
for (int i = 0; i < 256; i++) buffer[i]=i;
int size  = 1;
int count = 256;

Binary mode:

FILE *fp_binary = fopen(filename, "wb");
fwrite(buffer, size, count, fp_binary);

Versus text mode:

FILE *fp_text = fopen(filename, "wt");
fwrite(buffer, size, count, fp_text);
Cœur
  • 37,241
  • 25
  • 195
  • 267
jholl
  • 2,044
  • 2
  • 18
  • 22

7 Answers7

66

I believe that most platforms will ignore the "t" option or the "text-mode" option when dealing with streams. On windows, however, this is not the case. If you take a look at the description of the fopen() function at: MSDN, you will see that specifying the "t" option will have the following effect:

  • line feeds ('\n') will be translated to '\r\n" sequences on output
  • carriage return/line feed sequences will be translated to line feeds on input.
  • If the file is opened in append mode, the end of the file will be examined for a ctrl-z character (character 26) and that character removed, if possible. It will also interpret the presence of that character as being the end of file. This is an unfortunate holdover from the days of CPM (something about the sins of the parents being visited upon their children up to the 3rd or 4th generation). Contrary to previously stated opinion, the ctrl-z character will not be appended.
Jon Trauntvein
  • 4,453
  • 6
  • 39
  • 69
  • 13
    carriage return is actually '\r', '\n' is line feed. – Christoffer Hammarström Apr 19 '10 at 12:05
  • Does it have this behavior for all kinds of file operations? Eg. fread and fwrite (which are primarily used with binary files)? – Calmarius Oct 18 '13 at 16:34
  • The translation is specified when the file handle is opened and takes place at a low level. It will take place regardless of the functions that you use to read (or write) the file. – Jon Trauntvein Oct 21 '13 at 18:18
  • @Cheersandhth.-Alf -1 for repeating what has already been said 4 years before. – Virus721 Feb 17 '16 at 10:11
  • @Virus721: Oh thanks, he fixed it two days later and I didn't notice. (Not that I understand your comment, but it did guide my attention.) – Cheers and hth. - Alf Feb 17 '16 at 10:34
  • **0** Removed downvote because the answer's been corrected. – Cheers and hth. - Alf Feb 17 '16 at 10:34
  • Hi, can someone also explain, what difference would these return carriage make? I mean why are they added before \n on output and removed before \n on input? – Prakhar Agrawal Jun 08 '16 at 12:48
  • 1
    @Prakhar Agrawal: As I recall, the CR and LF codes date back to the days of the teletype. The carriage return ("\r") code would be sent to make the machine return its print head to the home position on the line and the line feed ("\n") would be sent to advance the platen forward by one line. These concepts were carried forward in terminal emulators even when they had largely lost their meanings so far as a physical equivalent. – Jon Trauntvein Apr 17 '17 at 14:29
31

In text mode, a newline "\n" may be converted to a carriage return + newline "\r\n"

Usually you'll want to open in binary mode. Trying to read any binary data in text mode won't work, it will be corrupted. You can read text ok in binary mode though - it just won't do automatic translations of "\n" to "\r\n".

See fopen

Zebra North
  • 11,412
  • 7
  • 37
  • 49
  • 4
    For reading, the translation works the opposite of what you describe - converting "\r\n" to "\n". – Mark Ransom Oct 23 '08 at 15:19
  • 1
    techtonik: All platforms will allow you to specify text mode, but on unix/linux it is no different to binary mode. Only on Windows does it make a difference. (And possibly some more obscure platforms - you'd have to check your platform documentation to be sure there) – Zebra North Feb 07 '14 at 19:38
6

Additionally, when you fopen a file with "rt" the input is terminated on a Crtl-Z character.

SmacL
  • 22,555
  • 12
  • 95
  • 149
  • 4
    True - I make my own file formats start with something like "my-file-type^Z", then if you "type"/"cat" it from the command line, it just gives you the file's "magic numbers" and stops instead of spewing binary to your terminal. – Zebra North Oct 23 '08 at 14:46
5

Another difference is when using fseek

If the stream is open in binary mode, the new position is exactly offset bytes measured from the beginning of the file if origin is SEEK_SET, from the current file position if origin is SEEK_CUR, and from the end of the file if origin is SEEK_END. Some binary streams may not support the SEEK_END.

If the stream is open in text mode, the only supported values for offset are zero (which works with any origin) and a value returned by an earlier call to std::ftell on a stream associated with the same file (which only works with origin of SEEK_SET.

Community
  • 1
  • 1
Ming
  • 4,110
  • 1
  • 29
  • 33
5

Even though this question was already answered and clearly explained, I think it would be interesting to show the main issue (translation between \n and \r\n) with a simple code example. Note that I'm not addressing the issue of the Crtl-Z character at the end of the file.

#include <stdio.h>
#include <string.h>

int main() {
    FILE *f;
    char string[] = "A\nB";
    int len;
    
    len = strlen(string);
    printf("As you'd expect string has %d characters... ", len); /* prints 3*/
    f = fopen("test.txt", "w"); /* Text mode */
    fwrite(string, 1, len, f);  /* On windows "A\r\nB" is writen */
    printf ("but %ld bytes were writen to file", ftell(f)); /* prints 4 on Windows, 3 on Linux*/ 
    fclose(f);
    return 0;
}

If you execute the program on Windows, you will see the following message printed:

As you'd expect string has 3 characters... but 4 bytes were writen to file

Of course you can also open the file with a text editor like Notepad++ and see yourself the characters:

enter image description here

The inverse conversion is performed on Windows when reading the file in text mode.

David Lopez
  • 353
  • 4
  • 13
4

We had an interesting problem with opening files in text mode where the files had a mixture of line ending characters:

1\n\r
2\n\r
3\n
4\n\r
5\n\r

Our requirement is that we can store our current position in the file (we used fgetpos), close the file and then later to reopen the file and seek to that position (we used fsetpos).

However, where a file has mixtures of line endings then this process failed to seek to the actual same position. In our case (our tool parses C++), we were re-reading parts of the file we'd already seen.

Go with binary - then you can control exactly what is read and written from the file.

Richard Corden
  • 21,389
  • 8
  • 58
  • 85
0

In 'w' mode, the file is opened in write mode and the basic coding is 'utf-8' in 'wb' mode, the file is opened in write -binary mode and it is resposible for writing other special characters and the encoding may be 'utf-16le' or others