I am in the process of writing a Ruby script/app that helps me compiling LaTeX to (at least) PDF. One feature I want it to have is that it should run pdflatex
iteratively until the PDF converges (as it should, I guess).
The idea is to compare the PDF generated in one iteration against the one from the former iteration using their fingerprints. In particular, I currently use Digest::MD5.file(.)
.
The problem now is that this never converges. A (The, hopefully) culprit is the PDF's timestamp that is set to the seconds at least by pdflatex
. Since runs of pdflatex
take typically longer than one second, the result keeps changing. That is, I expect the PDF's to be equal up to the timestamp(s) after some point. This assumption might be wrong; hints appreciated.
What can I do about this? My basic ideas so far:
- Use a library capable of doing the job
- Strip meta data away and only hash PDF content
- Overwrite timestamps by a fixed value before comparing
Do you have more ideas or even solutions? Solutions should only use free software that runs on Linux. Such that only use Ruby are preferred, but using external software is perfectly acceptable.
By the way, I do not exactly know how PDF is encoded but I suspect that merely comparing the contained text won't work for me since only graphics or links might change in later iterations.
Possibly related:
- How to compare two PDF files? (Messy, text-based or proprietary solutions)
- Functional PDF Testing (Uses a Java library; not clear wether it is up to the job)