I'm looking for a way to export the annotation layer of a PDF and merge it back in another PDF. I've tried using libraries like poppler and PyPDF2 but nothing worked so far. Are there any open-source libraries that could do what I want?
Asked
Active
Viewed 684 times
1 Answers
0
Disclaimer: I am the author of pText the library used in this example.
pText converts a PDF document to an internal JSON-like representation of nested lists, dictionaries and primitives. That means your question comes down to copying a dictionary from one JSON object to another. Should be pretty easy.
You would need to read the first document:
doc_in_a = None
with open("input_a.pdf", "rb") as in_file_handle:
doc_in_a = PDF.loads(in_file_handle)
Then you would need to read the second document:
doc_in_b = None
with open("input_b.pdf", "rb") as in_file_handle:
doc_in_b = PDF.loads(in_file_handle)
And then add all annotations from a to b:
annots = doc_in_a.get_page(0).get_annotations()
doc_in_b.get_page(0)[Name("Annots")] = List()
for a in annots:
doc_in_b.get_page(0)["Annots"].append(a)
Finally, write pdf b:
with open("output.pdf", "wb") as out_file_handle:
PDF.dumps(out_file_handle, doc_in_b)
You can obtain pText either on GitHub, or using PyPi There are a ton more examples, check them out to find out more about working with images.

Joris Schellekens
- 8,483
- 2
- 23
- 54
-
Would this load the whole PDF content in memory? – Bordaigorl Mar 17 '21 at 16:13