0

I am writing a library to generate PDF reports using prawn reports.

One of the features I wish to my gem is the ability to provide means of testing the generation of reports.

The problem is that two visually equal PDFs can have different files.

Is there a way to make sure that 2 visually equal PDF have the same bits in the file? Something like XML canonicalization.

Ricardo Acras
  • 35,784
  • 16
  • 71
  • 112

1 Answers1

3

'Visual equality' (or visual similarity': where only a small percentage of pixels is different for each page) of 2 different PDFs can occur even if the internal structure of PDF objects is very different. (Think of a page of 'text', which may use real fonts or which may use 'outline' vector graphics for each glyph's shape...)

That means this equality can only be determined by rendering the two files at the same resolution to page images and then comparing both image sets pixel by pixel. The result of the comparison could be another pixel image that shows all differing pixels as red, or, at your preference, just the number of pixels which do not agree.

A scriptable way to do this with the help of ghostscript, pdftk and ImageMagick I've described in this answer:

Alternatively, you may have a look at

(which is available for Linux, Unix, Mac OS X and Windows): it also can compare two PDF files visually.


[ Your literal question was this: "Is there a way to make sure that 2 visually equal PDF have the same bits in the file?" -- However, I'm not sure if you really meant it that way -- hence my above answer. Otherwise I'd have to say: If two PDF files are visually equal, just generate their respective MD5sum to determine if they have the same bits in each file... ]

Community
  • 1
  • 1
Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345