Add text to existing PDF document in Python

Question

~~I'm trying to convert a pdf to the same size as my pdf which is an A4 page.~~

convert my_pdf.pdf -density 300x300 -page A4 my_png.png

The resulting png file, however, is 595px × 842px which should be the resolution at 72 dpi. I was thinking of using PIL to write some text on some of the pdf fields and convert it back to PDF. But currently the image is coming out wrong.

Edit: I was approaching the problem from the wrong angle. The correct approach didn't include imagemagick at all.

You're converting a text file to an image file to write text to it to convert back to a hybrid image / text format? There is __no way__ that's the best way to annotate a `.pdf` or fill out a `.pdf` form. — agf, Jul 25 '11 at 16:34
You are probably right. I cannot seem to find a proper way to modify an existing pdf in python :/ — Uku Loskit, Jul 25 '11 at 16:37
possible duplicate of [Add text to Existing PDF using Python](http://stackoverflow.com/questions/1180115/add-text-to-existing-pdf-using-python) — bain, Jun 26 '14 at 12:41

score 34 · Answer 1 · edited May 23 '17 at 12:09

After searching around some I finally found the solution: It turns out that this was the correct approach after all. Yet, i feel that it wasn't verbose enough. It appears that the poster probably took it from here (same variable names etc).

The idea: create new blank PDF with Reportlab which only contains a text string. Then merge/add it as a watermark using pyPdf.

from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
packet = StringIO.StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(100,100, "Hello world")
can.save()

#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(file("mypdf.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = file("/home/joe/newpdf.pdf", "wb")
output.write(outputStream)
outputStream.close()

Hope this helps somebody else.

This solution is old and needs corrections. Refer to https://stackoverflow.com/questions/47573258/writing-text-over-a-pdf-in-python3 for corrections — Charalamm, May 31 '21 at 20:22

score 13 · Answer 2 · edited May 23 '17 at 12:34

I just tried the solution above, but I had quite some troubles to get it running in Python3. So, I would like to share my modifications. The adapted code looks as follows:

from PyPDF2 import PdfFileWriter, PdfFileReader
import io
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter

packet = io.BytesIO()

# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(100, 100, "Hello world")
can.save()

# move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(open("mypdf.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page2 = new_pdf.getPage(0)
page.mergePage(page2)
output.addPage(page)
# finally, write "output" to a real file
outputStream = open("newpdf.pdf", "wb")
output.write(outputStream)
outputStream.close()

Now the page.mergePage throws an error. Turns out to be a porting error in pypdf2. Please refer to this question for the solution: Porting to Python3: PyPDF2 mergePage() gives TypeError

score 5 · Accepted Answer · edited May 23 '17 at 12:17

5

You should look at Add text to Existing PDF using Python and also Python as PDF Editing and Processing Framework. These will point you in the right direction.

If you do what you've proposed in the question, when you export back to .pdf, it will really just be an image file embedded in a .pdf, it won't be text.

edited May 23 '17 at 12:17

Community

1
1

answered Jul 25 '11 at 17:05

agf

171,228
44
289
238

i accepted your answer as you made me reread that post(first link) and that led to the solution. Thank you. – Uku Loskit Jul 25 '11 at 19:29
I +1'd you because now I've got a known-working script for when I need to do this myself :) – agf Jul 25 '11 at 20:11

Patrick Maupin · Answer 4 · 2015-10-20T23:14:10.370

2

pdfrw will let you take existing PDFs and place them as form XObjects (similar to images) on a reportlab canvas. There are some examples for this in the pdfrw examples/rl1 subdirectory on github. Disclaimer -- I am the pdfrw author.

edited Oct 20 '15 at 23:14

answered Jul 11 '15 at 04:42

Patrick Maupin

8,024
2
23
42

Add text to existing PDF document in Python

4 Answers4

Linked