4

I'm looking for a way to export the annotation layer of a PDF and merge it back in another PDF. I've tried using libraries like poppler and PyPDF2 but nothing worked so far. Are there any open-source libraries that could do what I want?

1 Answers1

0

Disclaimer: I am the author of pText the library used in this example.

pText converts a PDF document to an internal JSON-like representation of nested lists, dictionaries and primitives. That means your question comes down to copying a dictionary from one JSON object to another. Should be pretty easy.

You would need to read the first document:

doc_in_a = None
with open("input_a.pdf", "rb") as in_file_handle:
    doc_in_a = PDF.loads(in_file_handle)

Then you would need to read the second document:

doc_in_b = None
with open("input_b.pdf", "rb") as in_file_handle:
    doc_in_b = PDF.loads(in_file_handle)

And then add all annotations from a to b:

annots = doc_in_a.get_page(0).get_annotations()
doc_in_b.get_page(0)[Name("Annots")] = List()
for a in annots:
    doc_in_b.get_page(0)["Annots"].append(a)

Finally, write pdf b:

with open("output.pdf", "wb") as out_file_handle:
    PDF.dumps(out_file_handle, doc_in_b)

You can obtain pText either on GitHub, or using PyPi There are a ton more examples, check them out to find out more about working with images.

Joris Schellekens
  • 8,483
  • 2
  • 23
  • 54