15

It's been an unpleasant surprise that '\n' is replaced with "\r\n" on Windows, I did not know that. (I am guessing it is also replaced on Mac...)

Is there an easy way to ensure that Linux, Mac and Windows users can easily exchange text files?

By easy way I mean: without writing the file in binary mode or testing and replacing the end-of-line chars myself (or with some third party program/code). This issue effects my C++ program doing the text file I/O.

Community
  • 1
  • 1
Ali
  • 56,466
  • 29
  • 168
  • 265
  • What editor are you using? What source control are you using? – Ates Goral Dec 31 '11 at 16:58
  • 2
    @AtesGoral These are irrelevant to the executable doing the text-based I/O. – Ali Dec 31 '11 at 17:01
  • 4
    "without writing the file in binary mode". This would be the "easy" way, why do you want to avoid it? – CB Bailey Dec 31 '11 at 17:01
  • Sorry, but line feeds are not "secretly" replaced. This behavior is well documented. From [an online tutorial on files](http://www.cplusplus.com/doc/tutorial/files/): "Non-binary files are known as text files, and some translations may occur due to formatting of some special characters (like newline and carriage return characters)." – André Caron Dec 31 '11 at 17:12
  • @CharlesBailey I did not know that you can use operator<< in binary mode :) I have only used write in binary mode. I expected problems on reading but it seems to work fine. Still testing... – Ali Dec 31 '11 at 17:15
  • @Ali Ah, without reading the question fully, I assumed the problem was with the source code :/ – Ates Goral Dec 31 '11 at 17:36
  • 2
    @CharlesBailey As it turns out, binary mode is the solution. It was my lack of knowledge... – Ali Dec 31 '11 at 18:24

3 Answers3

14

Apologies for the partial overlap with other answers, but for the sake of completeness:

Myth: endl is 'more portable' since it writes the line ending depending on the platform convention.

Truth: endl is defined to write \n to the stream and then call flush. So in fact you almost never want to use it. All \n that are written to a text-mode stream are implicitly converted to \r\n by the CRT behind the scenes, whether you use os<<endl, os<<'\n', or fputs("\n",file).

Myth: You should open files in text mode to write text and in binary mode to write binary data.

Truth: Text mode exists in the first place because some time ago there were file-systems that distinguished between text files and binary files. It's no longer true on any sane platform I know. You can write text to binary-opened files just as well, you just loose the automatic \n -> \r\n conversion on Windows. However, this conversion causes more harm than good. Among others, it makes your code behave differently on different platforms, and tell/seek become problematic to use. Therefore it's best to avoid this automatic conversion. Note that POSIX does not distinguish between binary and text mode.

How to do text: Open everything in binary mode and use the plain-old \n. You'll also need to worry about the encoding. Standardize on UTF-8 for Unicode-correctness. Use UTF-8 encoded narrow-strings internally, instead of wchar_t which is different on different platforms. Your code will become easier to port.

Tip: You can force MSVC to open all files in binary mode by default. It should work as follows:

#include <stdio.h>
#include <iostream>
int main() {
    _fmode = _O_BINARY;
    std::ofstream f("a.txt"); // opens in binary mode
}

EDIT: As of 2021, Windows 10 Notepad understands UNIX line endings.

Yakov Galka
  • 70,775
  • 16
  • 139
  • 220
  • 1
    @LokiAstari: I'm not advocating `fopen`, it was just the simplest and the most explicit example. You may like the edited version more. – Yakov Galka Dec 31 '11 at 19:25
  • 2
    @ybungalobill: Using `'\n'` in binary mode yields Unix line endings. On Windows, this breaks crappy text editors like notepad and almost any textbox you paste such content into (even when copied from an editor that handles Unix line-endings). Is this really what you are advocating, or have I completely misread you? – Marcelo Cantos May 18 '12 at 10:38
  • 1
    @MarceloCantos: Notepad is an excuse for a text editor. When copying&pasting some editors convert '\n' into '\r\n' (e.g. Wordpad or web browsers I checked), although I believe it's the receiver responsibility to understand '\n'. That said, I admit that the guideline is not acceptable if the text file is intended for a non-technical end-user, since she won't care how 'correct' your program is. – Yakov Galka May 18 '12 at 11:09
  • 1
    @ybungalobill: This isn't about non-technical users. I know of no text editor running on Windows that adheres to the policy you are advocating. Even emacs and vim emit CRLF by default. Sane or not, Windows does distinguish between text and binary, and to ignore this is just asking for trouble. Note that I don't object to your advice as an answer to this question, which is about cross-platform portability. What concerns me is the sense I get that you are advocating the use of binary I/O under all circumstances. If that wasn't your intent, then I apologise for drawing the wrong conclusion. – Marcelo Cantos May 18 '12 at 12:32
  • @MarceloCantos: Notepad++ is configurable to emit LF. But it is not a problem if some editor *emits* CRLF, since when you read text files you usually ignore the whitespace, and CR is just a whitespace character, so you do not have any trouble to *read* these files from C++. The discussion was about reading the output of your program. Also will you care to backup your claims that "Windows does distinguish between text and binary"? I do not see any text/binary related flags in `CreateFile`... – Yakov Galka Feb 04 '13 at 07:05
  • @ybungalobill: Not all library calls ignore whitespace. `fgets()` reads CR into the buffer, causing different behaviour depending on the line endings in the input file. Whether this matters or not is contingent on the programmer's intent; it should not be subject to a non-negotiable rule of the kind you recommend. WRT my "claims": Most C programs (especially portable ones) use `fopen()`, which treats text and binary files differently on Windows (on every runtime library I've used, at least). – Marcelo Cantos Feb 04 '13 at 22:19
  • @MarceloCantos: I haven't said that the library ignores it, its the programer who usually wants to ignore it. Even in the `fgets` case you will find yourself trimming the whitespace in the end. As per `fopen`, it is not Windows function, so one cannot infer from it anything about Windows per se. It does do the conversion by default on MSVC runtime, but the default can be overridden as I've shown to provide POSIX like behavior. In fact, imagine that you are *porting* something *to* Windows. In such case it is simpler to change the default than hunt the bugs. – Yakov Galka Feb 05 '13 at 06:51
12

The issue isn’t with endl at all, it’s that text streams reformat line breaks depending on the system’s standard.

If you don’t want that, simply don’t use text streams – use binary streams. That is, open your files with the ios::binary flag.

That said, if the only issue is that users can exchange files, I wouldn’t bother with the output mode at all, I’d rather make sure that your program can read different formats without choking. That is, it should accept different line endings.

This is by the way what any decent text editor does (but then again, the default notepad.exe on Windows is not a decent text editor, and won’t correctly handle Unix line breaks).

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
7

If you really just want an ASCII LF, the easiest way is to open the file in binary mode: in non-binary mode \n is replaced by a platform specific end of line sequence (e.g. it may be replaced by a LF/CR or a CR/LF sequence; on UNIXes it typically is just LF). In binary mode this is not done. Turning off the replacement is also the only effect of the binary mode.

BTW, using endl is equivalent to writing a \n followed by flushing the stream. The typically unintended flush can become a major performance problem. Thus, endl should be use rarely and only when the flush is intended.

Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380