3

We have to do a comparison of about 1500 PDF's in one folder with 1500 PDF's in another to check for visual differences. We have found DiffPDF(and comparePDF command line version) for Windows which is a lot faster than our automated Acrobat Pro comparisons.

So far I have used:

comparepdf -v=2 =c=a old.pdf new.pdf

but the problem with this is that it just returns "these files are different". Does anyone know of any way to save the output from command line? You can do this from the GUI but that would mean using something like TestCOmplete to automate it :(

Or are there better ways of doing a comparison of 2 PDF's visually- with output/highlighting/

Bonus points for C# .net libraries.

JustAnotherDeveloper
  • 3,167
  • 11
  • 37
  • 68
  • 2
    A few people have asked me to add some kind of difference output option to comparepdf for when pairs of files differ. This is on my todo list. However, it isn't likely to be done anytime soon. I use DiffPDF myself and only created comparepdf to satisfy people who kept asking for a command line version. And now people keep asking for more features for comparepdf:-) Of course, sponsorship could buy some of my time to add such things! – Mark Summerfield Jun 23 '12 at 16:52
  • Our current system automates GUI use so could use DiffPDF if the commandline option doesn't work out :) comparePDF is really fast though, impressed so far! – JustAnotherDeveloper Jun 27 '12 at 09:10

2 Answers2

7

You could have a look at these answers to similar questions:

However, I have no idea if any of these would be performing faster than what your automated Acrobat Pro comparison does... Let me know if you found out, will you?

Shortcut:

For simplicity, let's assume your input files to be compared are similar enough, and each being only 1 page. (For multi-page input expand the base idea of this answer...)

The two most essential commands any such comparison boils down to are these:

compare.exe ^
    %input1% ^
    %input2% ^
    -compose src ^
    %output%.tmp.pdf

and

pdftk.exe ^
    %output%.tmp.pdf ^
    background %input1% ^
    output %output%.pdf
  • The first command generates a PDF with all differential pixels colored in red. (A default resolution is used here, 72 dpi. For a more fine-grained view on pixel differences add -density 200 (that will mean: 200 dpi) or higher -- but your processing time will increase accordingly as will the disk space needed by the output...)
  • The second command tries to merge the resulting PDF with a background taken from ${input1}.

Optionally, you may add -verbose -debug coder after the compare command for a better idea about what's going on.

compare.exe is a commandline tool from the great, great ImageMagick family of utilities (available for Linux, Windows, Unix and MacOSX). But it requires a Ghostscript installation to use as a 'delegate' in order to be able to process PDF input. pdftk.exe is also a commandline utility, available for the same platforms. Both a Free Software.

After the first command, you'll have an output file which has only red pixels where there are differences found on the page.

After the second command, you'll have an output with all red 'diff' pixels in the context of the first input PDF.

Example output:

Here are screenshots of two 1-page PDF files with differences in their content:

Example PDF file 1 Example PDF file 2


Here are screenshots of the output produced by the two commands above:

  • The left one shows the intermediate result (after first command), with only the difference pixels displaying as red (identical pixels being white).
  • The screenshot on the right shows the red difference pixels, but this time with the input PDF file number 1 as a (gray) background (after second command).

Red difference pixels only; identical pixels are white Red difference pixels with PDF file 1 as background context


(PDF input files courtesy of Mark Summerfield, author of the beautiful DiffPDF tool.)

Community
  • 1
  • 1
Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345
  • Thanks! I've been away since Friday but I'll take a look at this today and let you know if it's suitable. I did a few tests on command line and diffPDF compared the PDF at least twice as fast - not sure about the differences when providing output though. I'll mark this as an answer this evening :) – JustAnotherDeveloper Jun 27 '12 at 09:02
2

I had the same problem, diffpdf is quick and nice but GUI only. [comparepdf] is console one but reports only exit code (no diff itself). [diff-pdf] has both console mode and diff.pdf output but it is slow and output is not friendly.

I have tried to add the required code to diffpdf, you can find it here: http://github.com/taurus-forever/diffpdf-console