1

I am looking for sample code or a third party tools on VB, .Net environment, which can compare a tiff and a pdf files (visual comparison) and returns true or false.

My requirement was to convert tiff files to pdf which i did using iTextSharp, but i now need to prove that after conversion, nothing got changed using a vb.net program (why ?. I have no idea, but i need to provide them such a service).

let me know if you guys know any such tool, i was searching, but all that i got is tools that would convert one format to another or compare same format files

venu
  • 147
  • 3
  • 15
  • I don't think you could prove prove grammatically that two different files formats product the same visual. You could print to PCL and compare those files but just because the files are not identical does not mean the printed image to the human eye is not identical. – paparazzo Aug 06 '12 at 19:22
  • yeah, exactly that is what i said, but i was requested to find if there is a way. – venu Aug 07 '12 at 18:04

2 Answers2

2

You could try re-extracting the tiff from the PDF and comparing the raw data of this image with the raw data of your original tiff file.

Since the PDF format supports embedding TIFF files, your customer probably just wants to make sure that you did not re-compressed the images to some other format and had some quality loss in the process. It is a reasonable concern.

Getting the raw data from your image file:

Since you are using iText, for 1-page tiff files you might be able to do get this data by using the method Image.rawData(). You can create an instance of this Image class from your TIFF file by using the method TiffImage.getTiffImage

Getting the raw data from your PDF file:

You can follow the process explained here, then you can get the raw data by using the method PdfReader.GetStreamBytes

You can compare the streams byte by byte, or you can save them to file while creating your PDF so that you can use them for a comparison later on using a command line tool, or you can compute an MD5 hash and use that instead.

I have not tested this approach, but I believe it will work since there is no TIFF metadata involved.

Community
  • 1
  • 1
yms
  • 10,361
  • 3
  • 38
  • 68
  • Yeah, sounds a good option, but could you show me a sample or a tool that can compare tiff format files ? I do know bitcompare tools can compare pdfs, not sure about tiff. – venu Aug 06 '12 at 16:00
2

ImageMagick's compare command can do that very easily.

 compare file.tif file.pdf -compose src delta.pdf

or, assuming multipage TIFFs and multipage PDF, comparing page by page:

 compare file.tif[0] file.pdf[0] -compose src delta_page1.pdf
 compare file.tif[1] file.pdf[1] -compose src delta_page2.pdf
 compare file.tif[2] file.pdf[2] -compose src delta_page3.pdf
 [....]

(ImageMagick's indexing of pages/images starts with [0], not [1]!).

Understanding the delta.pdf:

  1. The resulting delta.pdf will be completely white if there is no visual difference.
  2. The differing pixels will be red.
  3. The resulting file will use the default 72dpi resulution, which can tend to not discover very small pixel differences.

You can even simplify the command like this:

 compare file.tif file.pdf delta.pdf

The resulting delta.pdf will show (for context) the first file from the commandline as a light gray background image, and overlay the differences as red pixels. Of course, in theory you can also reverse the order for each of the commands:

 compare file.pdf file.tif delta.pdf

However, you should be aware that PDF "white" appearing backgrounds in reality very often are transparent, whereas TIFFs are real white. This will lead to a lot of pixel differences showing up. Better stick with the order I named first :-)

Note 1: All these comparisons assume (of course) the same page image dimensions and aspect ratios. (Otherwise you may need to scale one of the two page images first.)

Note 2: You will almost always discover minor pixel differences, depending on your overall processing chain. It all depends on what kind of errors you want to uncover with this comparison. There are quite a few ways to finetune this....

Note 3: If this approach works in principle for you, you can modify the output format: you do not need to really use the visual difference in a "red pixel image". You could instead count the unique white (equal) and red (differing) pixels each, then based on the percentage of red compared to white make a decision wether this is 'good' or 'bad' and finally return 'true' or 'false' accordingly (example command shown for 2 PDFs instead of 1/1 PDF/TIFF):

Sample command:

compare \
   http://qtrac.eu/boson1.pdf[1] http://qtrac.eu/boson2.pdf[1] -compose src \
  -define histogram:unique-colors=true \
  -format %c \
   histogram:info:-

Sample output:

 56934: (61937,    0, 7710,52428) #F1F100001E1ECCCC srgba(241,0,30,0.8)
444056: (65535,65535,65535,52428) #FFFFFFFFFFFFCCCC srgba(255,255,255,0.8)

This output lends itself well for automatic unit testing. You can evaluate the two numbers, easily compute the "red pixel" vs. "white pixel" ration and then decide to return PASSED or FAILED based on a certain threshold (if you don't strictly need "zero red" pixels).

Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345
  • I just tried it, i have 'ImageMagick-6.7.8-Q16', the samples you provided here works good for tif to tif file comparison, but when i try tif to pdf, it just simply says pdf file is not available. – venu Aug 07 '12 at 16:52
  • @venu: ImageMagick cannot process PDFs itself, it requires a Ghostscript installation which it can use as its '*delegate*'. Maybe that is the issue? – Kurt Pfeifle Aug 07 '12 at 20:14
  • @venu: Ah, maybe your ImageMagick cannot access the PDFs over HTTP. Then you must download these two files by other means, and run the command with local access to the files: `compare boson1.pdf[1] boson2.pdf[1] -compose src ...` – Kurt Pfeifle Aug 07 '12 at 20:16
  • both tif and pdf files are on my local machine. – venu Aug 07 '12 at 21:16