3

I am using CFDOCUMENT to create a PDF in CF9.0.1. However with the same input each time I generate a new PDF using CFDOCUMENT the MD5 hash seems to be different.

Test code is simple:

<cfdocument name=FileData1 format="PDF" localurl="yes" pagetype="A4"><h3>I am happy!</h3></cfdocument>
<cfdocument name=FileData2 format="PDF" localurl="yes" pagetype="A4"><h3>I am happy!</h3></cfdocument>
<cffile ACTION="write" FILE="C:\happy1.pdf" OUTPUT=#FileData1# ADDNEWLINE=NO NAMECONFLICT="Override">
<cffile ACTION="write" FILE="C:\happy2.pdf" OUTPUT=#FileData2# ADDNEWLINE=NO NAMECONFLICT="Override">

Both files produced have different MD5 file-hash although both PDF looks exactly the same. I have a user requirement where if the file is the same to ignore regeneration of PDF, so does anyone know how to force CF9 to generate the same PDF with same MD5 hash (bit similarity) if given the same input?

I ran a HxD Hex File Compare and found that the file differs in three sections:

  • The font name e.g. 62176/FontName/OJSSWJ+TimesNewRomanPS (the OJSSWJ is random)
  • The timestamp /CreationDate(D:20110927152929+08'00')
  • Some sort of key at the end: <]/Info 12 0 R/Size 13>>

Thanks for your help in advance!

andrwo
  • 53
  • 2

1 Answers1

4

They will never be the same.

The timestamp /CreationDate(D:20110927152929+08'00')

The creationDate is a timestamp of when it was created, thus unless you create it at the same second every time, it wont be the same.

You might be able to modify the pdf and remove or modify this bit.

Or use a different method to determine if you should create the pdf, creating it to md5 compare the results seems like a waste of processing power.

Dale Fraser
  • 4,623
  • 7
  • 39
  • 76
  • Thanks for your comments, but I was hoping there is a way to actually force the PDF to generate the same PDF on the same HTML content (e.g. specifying a fixed creation date if possible). My example is grossly simplifying the requirement, this is because what is happening is that the files are generated on the fly sent across multiple parties, and we need to track incoming files to make sure it is not a duplicated one. – andrwo Sep 29 '11 at 07:07
  • When the files are incoming, strip out the parts of the PDF that change, then MD5 the rest and check for duplicates that way. – Dale Fraser Sep 30 '11 at 13:52