1

I need to insert an hyperlink into a few thousand existing pdfs. I'm working with zend_pdf which apparently is not able to set an invisible border. The only way I found to make the link borders invisible (found it somewhere else on this site, here, to be precise) is to search for each link "element" of the pdf and add a /Border annotation, like this:

echo str_replace('/Annot /Subtype /Link', '/Annot /Subtype /Link /Border[0 0 0]', $pdf->render());

Since I need to work on files that reside on my filesystem, I'm using the sed command for the search & replace operation.
Now, at first sight this works, as the documents are displayed correctly by Acrobat 8, osx 10.6's Viewer and Ubuntu's document viewer. However, tools such as pdftk (1.41) and pdfinfo (0.12.1) report the structure is corrupted. This is annoying since it means that no further manipulation of the pdf using pdftk will be possible, since the tool refuses to work on the file as there are errors in it. I looked into the files using a binary editor and I found out that if I add more than two bytes after "/Link", the file gets corrupted. This confuses me a lot, since based on the pdf specifications (I'm using 1.4) there is no checksum except for streams, which should mean that one can add as much bytes as he wants, as long as he's not doing that inside a stream and the inserted bytes are valid pdf syntax. What am I missing here?

Here is an example:
the original pdf
the processed pdf

Community
  • 1
  • 1
Maurizio
  • 998
  • 6
  • 8
  • Have you run preflight on a document before editing to ensure it's your edit that's causing the problem? – Carey Gregory Sep 16 '11 at 20:39
  • @carey-gregory Sure I did. Since I'm using a script which is easy to modify, I also tried several different variations (i.e. adding a space after the last zero in /Border [0 0 0], no spaces at all between tags, etc.) to see if the problem was with the pdf syntax. As it turns out, the original pdf is valid, and it remains valid if I add up to two spaces after /Link. But if I add /Border etc. it becomes invalid. – Maurizio Sep 17 '11 at 07:45
  • Can you post an example of one of the edited documents? – Carey Gregory Sep 17 '11 at 20:07
  • PDF files are not text files, they are binary files with a rigid structure, so I am not surprised that a search&replace operation corrupts the file. What if the internal content of the file is compressed or encrypted? You need a library/API that allows you to open the file structure properly and change the attributes of the annotations you want to modify. – yms Sep 23 '11 at 13:59
  • [HaruAnnotation::setBorderStyle](http://www.php.net/manual/en/function.haruannotation-setborderstyle.php) for example looks promissing – yms Sep 23 '11 at 14:03
  • @yms: I realized that adding the /Border is shifting all the subsequent file contents by a few bytes, which actually corrupts the xref table. The xref table, which is at the beginning of the file, references all the objects by their position in **bytes from the beginning of the file**. Since recalculating the full xref table for each modification seems too much an hack, the way to go is definitely either trying to extend zend_pdf to support border properties, or finding other libraries that can do this. I'll give HaruAnnotation a try, thank you! – Maurizio Sep 23 '11 at 15:55
  • @yms: HaruAnnotation seems promising but it is currently unable to work on existing pdf. – Maurizio Sep 27 '11 at 08:38

1 Answers1

1

Adding the additional "/Border" element in the file actually corrupts the pdf's xref table. The xref table references all the objects by their position, measured in bytes from the beginning of the file. Inserting the additional element of course shifts the position (offset) of the subsequent contents by a few bytes.
To fix the xref table after the edit, I can use pdftk from pdf labs (http://www.pdftk.com) to fix the xref table:

$ pdftk corrupted_file.pdf output fixed_file.pdf

As a matter of fact, I was not able to find a comprehensive Pdf solution for php, and I had to use several different kinds of tools (zend_pdf, pdftk, sed) to implement my workflow.

Maurizio
  • 998
  • 6
  • 8
  • As I mentioned before, this approach will not work in many cases. It will fail for example if the xref table is compressed (see "3.4.7-Cross-Reference Streams" in PDF Reference sixth edition), or if the file is encrypted. – yms Sep 27 '11 at 12:20
  • @yms: I agree my approach is more than quick and dirty. I would have preferred to be able to to everything in a proper way with a single library. Unfortunately, such a library apparently doesn't exist yet. – Maurizio Sep 28 '11 at 07:10