Historically caused, different OSes uses distinct sequences to mark line endings in text files:
- Unix and companions
\n
(linefeed)
- DOS and Windows
\r\n
(carriage-return, linefeed)
- Mac OS (before Mac OS X)
\r
(carriage-return) (Mac OS X (which got a BSD Unix kernel) might support both: A Line Break Is a Line Break).
This is all a mess, e.g.:
- Sometimes Windows text files look a bit strange in Xemacs with all lines decorated with a
^M
at line end.
- Windows Notepad (the included plain text editor) shows Linux text files in one line only.
Once, you switch periodically between different OSes, you start to get used that line-endings has to be fixed from time to time. There are numerous helper tools for this e.g. unix2dos
and dos2unix
in cygwin, special commands in Notepad++, prompts in VisualStudio, etc.
In C, a line-ending is always remarked by \n
even in DOS and Windows. (I have no experience with Mac OS but I would wonder if it isn't the same there.) To make this working seemlessly, MS decided to "fix" file contents in reading and writing "under the hood". While reading a file, all occurrences of \r\n
are replaced silently by \n
while file writing inserts a \r
before each written \n
.
This has some annoying drawbacks:
If a file of certain size is read, the "received" contents might be some bytes smaller. (I once stumbled over this when I tried to reserve space prior file loading and reading the whole contents at once. I wondered why some bytes seemed to be missing after loading.)
This may break loading of binary files where \n
simply represents a binary value of 10 with any meaning (beyond line break).
To fix this, the C API provides additional modes for file I/O. E.g. fopen()
supports beyond r
, w
, and a
, an extra character to indicate file type
b
denotes binary I/O (don't touch contents)
t
denotes text I/O (fix line-endings).
Without any of them, the default is text I/O.
On Windows as well as for portable file I/O, this should be always given. (On Linux, it simply doesn't have any effect especially no damaging.)
I once wrote an answer to SO: Copying a bmp in c where a dump of a broken BMP file illustrated the effect of wrong done file output nicely.
After this long story about text and binary file I/O, it might be obvious that it is always a potential issue for developers dealing with image data (which is usually encoded binary).
Hence, I can imagine that the \r\n\032\n
sequence is simply a test pattern for this. If these 4 bytes don't have exactly these values chances are good that
- file is opened with wrong mode (on a platform where this is relevant) or
- a previous tool damaged contents of the file.
To cite PeteBlackerThe3rd:
It will allow the decoder to throw useful error messages in that case as opposed to failing mysteriously.