19

I am doing image processing in a scientific context. Whenever I need to save an image to the hard drive, I want to be able to reopen it at a later time and get exactly the data that I had before saving it. I exclusively use the PNG format, having always been under the impression that it is a lossless format. Is this always correct, provided I am not using the wrong bit-depth? Should encoder and decoder play no role at all? Specifically, the images I save

  • are present as 2D numpy arrays
  • have integer values from 0 to 255
  • are encoded with the OpenCV imwrite() function, e.g. cv2.imwrite("image.png", array)
smcs
  • 1,772
  • 3
  • 18
  • 48
  • 5
    I do not know of any way that you could could get anything other than *"pixel perfect"* data in that scenario. – Mark Setchell Dec 19 '17 at 10:42
  • 4
    if you are in doubt, load the image again and compute absdiff and test whether any result pixel isn't 0, for some good amount of sample images. – Micka Dec 19 '17 at 10:58
  • @Micka Good idea, but it would be heuristic after all. There is a lot of information out there and people told me PNG can be lossy in this case and that case, that it's only good gray values or certain textures etc... I was left somewhat confused and was hoping for a definitive answer for at least my special case :) – smcs Dec 19 '17 at 11:09
  • 2
    My current information is that PNG compression is absolutely _never lossy_, but the user can screw it up by scramming too many bits per pixel into the format, resulting in loss of color/value range. – smcs Dec 19 '17 at 11:12
  • Use TIFF format? – percusse Dec 19 '17 at 13:45
  • 2
    @percusse To what advantage? AFAIK, there isn't _one_ TIFF format. It seems like you have to look very closely at what you're doing when using it. The english wikipedia page lists over 20 different compression modes, some lossless, some lossy. On a second look, apparently only 5 of those are used frequently. Still, it seems like a very complex format with many versions and degrees of freedom – smcs Dec 19 '17 at 13:55
  • being the storage choice of professional photographers next to .RAW files for example – percusse Dec 19 '17 at 14:05
  • 2
    There is no benefit to using TIFF here - it just complicates things and adds dependencies. In fact, I would go the other way and take the simplest possible format, which doesn't support compression - namely one of the NetPBM formats, e.g. `PGM` for greyscale or `PPM` for colour - especially as **OpenCV** can read/write that without any library dependencies. Plus they also support 16-bit if higher colour resolution becomes necessary later... https://en.wikipedia.org/wiki/Netpbm_format – Mark Setchell Dec 19 '17 at 14:21
  • @MarkSetchell Thanks, I hadn't heard of those before. I do want to use compression though, since I'm saving a large number of large images. Otherwise I would just save write the arrays to the hard drive using `numpy.save()` without having to rely on any additional image reader/writer :) – smcs Dec 19 '17 at 14:29
  • @speedymcs Are you also worrying about checksums etc.? – jtlz2 Aug 24 '18 at 11:58
  • @jtlz2 I'm not familiar with them in this context.. should I? – smcs Aug 24 '18 at 14:26
  • @speedymcs It was more that you said you wanted to ensure that - in the scientific context - you get the right data back. At one point we used checksums to maintain data integrity through hard-drive round trips - but perhaps I am taking your question too far :) – jtlz2 Aug 24 '18 at 14:30
  • @jtlz2 Ah, I see :) I think it's good for comparing two files that should be identical, like an original and a copy, say a download from the web, but when compressing an image file, the checksum changes of course. I guess decompressing the new file and comparing the SSD between pixel values, similar to what was proposed in the second comment, could be seen as a kind of check sum in this context though. – smcs Aug 24 '18 at 14:37
  • Good point re checksum changing - red herring - sorry! :) I sort of meant just maintaining the integrity of the compressed file: memory -> disk -> memory. I suppose in astronomy at least sometimes people worry about bit flips, but it's actually probably a minority sport in the end – jtlz2 Aug 24 '18 at 15:44

2 Answers2

17

PNG is a lossless format by design:

Since PNG's compression is fully lossless--and since it supports up to 48-bit truecolor or 16-bit grayscale--saving, restoring and re-saving an image will not degrade its quality, unlike standard JPEG (even at its highest quality settings).

The encoder and decoder should not matter, in regards of reading the images correctly. (Assuming, of course, they're not buggy).

And unlike TIFF, the PNG specification leaves no room for implementors to pick and choose what features they'll support; the result is that a PNG image saved in one app is readable in any other PNG-supporting application.

Dan Mašek
  • 17,852
  • 6
  • 57
  • 85
  • Thanks! Encoders seem to vary dramatically in terms of compression rate though, with the standard one being also used by OpenCV performing much wors than others: https://stackoverflow.com/a/12216875/5522601 – smcs Dec 20 '17 at 12:23
15

While png is lossless, this does not mean it is uncompressed by default.

I specify compression using the IMWRITE_PNG_COMPRESSION flag. It varies between 0 (no compression) and 9 (maximum compression). So if you want uncompressed png:

cv2.imwrite(filename, data, [cv2.IMWRITE_PNG_COMPRESSION, 0])

The more you compress, the longer it takes to save.

Link to docs

eric
  • 7,142
  • 12
  • 72
  • 138