3

In Desiginin File Formats link that i've gotten from this website, i've noticed that png has CRLF\x1A\LF chunk that is ment for "testing" Carriage return and line feeder conversion.

I am building a custom binary structures for some project and i am wondering why is this useful, and in which scenario i should think about adding it ?

Danilo
  • 1,017
  • 13
  • 32
  • Where in the linked document do you find anything about that "testing" chunk? – Some programmer dude Jul 09 '19 at 11:25
  • Designing File formats > Identification Bytes > fourth paragraph. – Danilo Jul 09 '19 at 11:26
  • 2
    Ah that bit. It's not about a testing chunk or something like that. It's for a way to make sure the file handling functionality can be trusted to do what it's supposed to do. The `\r\n` sequence should not be translated, and neither should the `\n`, they should be read exactly as that. – Some programmer dude Jul 09 '19 at 11:31
  • Ok, and i get that they shouldn't be translated. But the way i see it is that png don't actually work with any file types that require new line, so this must be for some sort of system operation. That is why i've asked in which scenario i should think about adding it? – Danilo Jul 09 '19 at 11:35
  • 3
    It looks like a way of the png decoder function checking that an upstream process hasn't opened the file in ascii mode as opposed to binary mode. It's quite a clever idea that I've not come across before. It will allow the decoder to throw useful error messages in that case as opposed to failing mysteriously. – PeteBlackerThe3rd Jul 09 '19 at 11:44
  • 2
    Remember what happens with binary files on Windows if you open the file with `fopen(name, "r");` All the `\r\n`s are silently replaced by `\n` possibly breaking the binary contents. I believe it's something like a test pattern. If it's not excatly `\r\n\032\n`, a previously applied tool might have damaged the contents. – Scheff's Cat Jul 09 '19 at 11:46
  • I once answered a question there something similar happened to BMP: [SO: Copying a bmp in c](https://stackoverflow.com/a/46477480/7478597). ;-) – Scheff's Cat Jul 09 '19 at 11:48
  • Owwww.... cool. Dammit that is smart! Would any of you guys mind answering a question so i can flag it as an answer. And we should perhaps notify mods to join these two questions ? – Danilo Jul 09 '19 at 11:51

1 Answers1

4

Historically caused, different OSes uses distinct sequences to mark line endings in text files:

  • Unix and companions \n (linefeed)
  • DOS and Windows \r\n (carriage-return, linefeed)
  • Mac OS (before Mac OS X) \r (carriage-return) (Mac OS X (which got a BSD Unix kernel) might support both: A Line Break Is a Line Break).

This is all a mess, e.g.:

  • Sometimes Windows text files look a bit strange in Xemacs with all lines decorated with a ^M at line end.
  • Windows Notepad (the included plain text editor) shows Linux text files in one line only.

Once, you switch periodically between different OSes, you start to get used that line-endings has to be fixed from time to time. There are numerous helper tools for this e.g. unix2dos and dos2unix in cygwin, special commands in Notepad++, prompts in VisualStudio, etc.

In C, a line-ending is always remarked by \n even in DOS and Windows. (I have no experience with Mac OS but I would wonder if it isn't the same there.) To make this working seemlessly, MS decided to "fix" file contents in reading and writing "under the hood". While reading a file, all occurrences of \r\n are replaced silently by \n while file writing inserts a \r before each written \n.

This has some annoying drawbacks:

  1. If a file of certain size is read, the "received" contents might be some bytes smaller. (I once stumbled over this when I tried to reserve space prior file loading and reading the whole contents at once. I wondered why some bytes seemed to be missing after loading.)

  2. This may break loading of binary files where \n simply represents a binary value of 10 with any meaning (beyond line break).

To fix this, the C API provides additional modes for file I/O. E.g. fopen() supports beyond r, w, and a, an extra character to indicate file type

  • b denotes binary I/O (don't touch contents)
  • t denotes text I/O (fix line-endings).

Without any of them, the default is text I/O.

On Windows as well as for portable file I/O, this should be always given. (On Linux, it simply doesn't have any effect especially no damaging.)

I once wrote an answer to SO: Copying a bmp in c where a dump of a broken BMP file illustrated the effect of wrong done file output nicely.

After this long story about text and binary file I/O, it might be obvious that it is always a potential issue for developers dealing with image data (which is usually encoded binary).

Hence, I can imagine that the \r\n\032\n sequence is simply a test pattern for this. If these 4 bytes don't have exactly these values chances are good that

  • file is opened with wrong mode (on a platform where this is relevant) or
  • a previous tool damaged contents of the file.

To cite PeteBlackerThe3rd:

It will allow the decoder to throw useful error messages in that case as opposed to failing mysteriously.

Scheff's Cat
  • 19,528
  • 6
  • 28
  • 56