Add text to Existing PDF using Python

Question

I need to add some extra text to an existing PDF using Python, what is the best way to go about this and what extra modules will I need to install.

Note: Ideally I would like to be able to run this on both Windows and Linux, but at a push Linux only will do.

Edit: pypdf and ReportLab look good but neither one will allow me to edit an existing PDF, are there any other options?

PyPDF2 allows you to copy every page + [add a text annotation](https://pypdf2.readthedocs.io/en/latest/modules/AnnotationBuilder.html#PyPDF2.generic.AnnotationBuilder.text) on top: — Martin Thoma, Dec 20 '22 at 18:01

score 180 · Answer 1 · edited Jan 20 '23 at 11:59

180

Example for [Python 2.7]:

from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter

packet = StringIO.StringIO()
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(10, 100, "Hello world")
can.save()

#move to the beginning of the StringIO buffer
packet.seek(0)

# create a new PDF with Reportlab
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(file("original.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = file("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()

Example for Python 3.x:

from PyPDF2 import PdfFileWriter, PdfFileReader
import io
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter

packet = io.BytesIO()
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(10, 100, "Hello world")
can.save()

#move to the beginning of the StringIO buffer
packet.seek(0)

# create a new PDF with Reportlab
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(open("original.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.pages[0]
page.merge_page(new_pdf.pages[0])
output.add_page(page)
# finally, write "output" to a real file
output_stream = open("destination.pdf", "wb")
output.write(output_stream)
output_stream.close()

edited Jan 20 '23 at 11:59

Wtower

18,848
11
103
80

answered Jul 09 '13 at 00:16

David Dehghan

22,159
10
107
95

16

For python3, packet should be `io.BytesIO` and use PyPDF2 rather than pyPDF (which is unmaintained). Great answer! – Noufal Ibrahim Jun 23 '16 at 11:36
4

Thanks for sharing. It works great. One note: I believe it's better to use `open` instead of `file`. – mitenka Sep 05 '16 at 10:09
2

Careful: The new document only includes the first page of the original! It's easy enough to copy the rest of the pages from `existing_pdf` to `output`, the sample code just doesn't. – alexis Jul 25 '17 at 13:25
1

@alexis: How would you modify the code to put something on the second page of the pdf? I have a form that uses two pages and I am stuck on the first page. Thanks in advance. – DavidV Feb 12 '19 at 19:23
@alexis: I did and it does work (yesterday also worked but I had another problem to solve), but it's really really really really slow. Like 1 document per 10 seconds. And I need 250 of them. Any thoughts on how to fix that? Thanks. – DavidV Feb 13 '19 at 11:01
Mhh, get a faster computer? Buy a different solution from Adobe? I'm sorry but I'm not a regular user of this software, I don't even know why you are asking _me._ Profile your code to make sure the problem is in the pdf library, and take it from there. Or just go get lunch while your script runs, it'll be done before the hour is over (so, faster than you can code an improved solution.) – alexis Feb 13 '19 at 11:09
Is there a way to also ensure that fillable fields get merged? I'm trying this in combination with reportlab to add fillable fields to an existing document, but it seems because the merge only adds the pages themselves and not the fields, *nothing* in the merged document is fillable (including fields that were fillable in the original). How do I merge the fields as well? (I noticed that when *copying* a document, if I use `appendPagesFromReader` the copy isn't fillable, but if I use `cloneReaderDocumentRoot`, the copy is fillable. Is there a way to merge two roots?) – Katie Jun 14 '19 at 01:06
@DavidV substitute 0 with 1 – PythonProgrammi Apr 10 '20 at 08:33
page.mergePage(new_pdf.getPage(0)), here PageObject object has no attribute mergepage – Nilanj Sep 03 '21 at 11:16
@David Dehghan Thanks a lot. The code draws the string in the first page only. How can I draw the string to all the pages of the existing pdf? – YasserKhalil Sep 21 '21 at 07:29

score 106 · Accepted Answer · edited Dec 10 '18 at 17:43

I know this is an older post, but I spent a long time trying to find a solution. I came across a decent one using only ReportLab and PyPDF so I thought I'd share:

read your PDF using PdfFileReader(), we'll call this input
create a new pdf containing your text to add using ReportLab, save this as a string object
read the string object using PdfFileReader(), we'll call this text
create a new PDF object using PdfFileWriter(), we'll call this output
iterate through input and apply .mergePage(*text*.getPage(0)) for each page you want the text added to, then use output.addPage() to add the modified pages to a new document

This works well for simple text additions. See PyPDF's sample for watermarking a document.

Here is some code to answer the question below:

packet = StringIO.StringIO()
can = canvas.Canvas(packet, pagesize=letter)
<do something with canvas>
can.save()
packet.seek(0)
input = PdfFileReader(packet)

From here you can merge the pages of the input file with another document.

I recommend using PyPDF2 since it is more updated, also check their sample code: https://github.com/mstamy2/PyPDF2/blob/41d90b4d141d0b019d145748f53ea556efcb47d1/Sample_Code/basic_features.py — blaze, Apr 23 '15 at 04:06
This code will create a new pdf file and will skip all metadata. So it's not appending to existing pdf. — Anton Kukoba, Apr 23 '18 at 11:56

Patrick Maupin · Answer 3 · 2015-10-20T23:13:33.797

19

pdfrw will let you read in pages from an existing PDF and draw them to a reportlab canvas (similar to drawing an image). There are examples for this in the pdfrw examples/rl1 subdirectory on github. Disclaimer: I am the pdfrw author.

edited Oct 20 '15 at 23:13

answered Jul 11 '15 at 04:47

Patrick Maupin

8,024
2
23
42

FWIW, there are some more reportlab/pdfrw examples if you start following [this link](http://stackoverflow.com/questions/31712386/loading-matplotlib-object-into-reportlab). I answered there, based on an answer in the dupe target. – Patrick Maupin Aug 26 '15 at 13:54

score 8 · Answer 4 · answered Mar 05 '14 at 11:51

8

cpdf will do the job from the command-line. It isn't python, though (afaik):

cpdf -add-text "Line of text" input.pdf -o output .pdf

answered Mar 05 '14 at 11:51

user2243670

345
3
8

3

Carefully check the license for cpdf before using - it's not Open Source. – Tim Small Mar 22 '22 at 09:22

score 7 · Answer 5 · edited May 23 '17 at 12:02

Leveraging David Dehghan's answer above, the following works in Python 2.7.13:

from PyPDF2 import PdfFileWriter, PdfFileReader, PdfFileMerger

import StringIO

from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter

packet = StringIO.StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(290, 720, "Hello world")
can.save()

#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader("original.pdf")
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = open("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()

If the existing pdf has multiple pages, how do you ensure the output has the same number of pages with only difference being the edited page? Im hoping there is a simpler way without making weird loops — West, Feb 16 '22 at 00:33

score 2 · Answer 6 · answered Mar 08 '23 at 16:02

The PyPDF2 as of date of writing has depreciated the PdfFileReader, PdfFileWriter and few other methods and changed it to different names and methods and has also changed methods like getPage() directly to attribute of PdfReader.

Here is a very Simple Class to add text to existing pdf file: (Use is demonstrated at end)

from PyPDF2 import PdfWriter, PdfReader, Transformation
import io
from reportlab.pdfgen.canvas import Canvas

class GenerateFromTemplate:
    def __init__(self,template):
        self.template_pdf = PdfReader(open(template, "rb"))
        self.template_page= self.template_pdf.pages[0]

        self.packet = io.BytesIO()
        self.c = Canvas(self.packet,pagesize=(self.template_page.mediabox.width,self.template_page.mediabox.height))

    
    def addText(self,text,point):
        self.c.drawString(point[0],point[1],text)

    def merge(self):
        self.c.save()
        self.packet.seek(0)
        result_pdf = PdfReader(self.packet)
        result = result_pdf.pages[0]

        self.output = PdfWriter()

        op = Transformation().rotate(0).translate(tx=0, ty=0)
        result.add_transformation(op)
        self.template_page.merge_page(result)
        self.output.add_page(self.template_page)
    
    def generate(self,dest):
        outputStream = open(dest,"wb")
        self.output.write(outputStream)
        outputStream.close()

"""
Use as:
gen = GenerateFromTemplate("template.pdf")
gen.addText("Hello!",(100,200))
gen.addText("PDF!",(100,300))
gen.merge()
gen.generate("Output.pdf")
"""

Hope this helps.

ConMan77 · Answer 7 · 2022-02-27T07:26:02.320

Don't use mergePage, It may not work for some pdfs You should use mergeRotatedTranslatedPage

from PyPDF2 import PdfFileWriter, PdfFileReader
import io
from reportlab.pdfgen.canvas import Canvas

page_to_merge = 0 #Refers to the First page of PDF 
xcoor = 250 #To be changed according to your pdf
ycoor = 650 #To be changed according to your pdf

input_pdf = PdfFileReader(open("Source.pdf", "rb"))
page_count = input_pdf.getNumPages()
inputpdf_page_to_be_merged = input_pdf.getPage(page_to_merge)

packet = io.BytesIO()
c = Canvas(packet,pagesize=(inputpdf_page_to_be_merged.mediaBox.getWidth(),inputpdf_page_to_be_merged.mediaBox.getHeight()))
c.drawString(xcoor,ycoor,"Hello World")
c.save()
packet.seek(0)

overlay_pdf = PdfFileReader(packet)
overlay = overlay_pdf.getPage(0)

output = PdfFileWriter()

for PAGE in range(page_count):
    if PAGE == page_to_merge:
        inputpdf_page_to_be_merged.mergeRotatedTranslatedPage(overlay, 
                inputpdf_page_to_be_merged.get('/Rotate') or 0, 
                overlay.mediaBox.getWidth()/2, overlay.mediaBox.getWidth()/2)
        output.addPage(inputpdf_page_to_be_merged)
    
    else:
        Page_in_pdf = input_pdf.getPage(PAGE)
        output.addPage(Page_in_pdf)

outputStream = open("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()

What version the PyPDF2 is in this answer? – thinker3 Mar 22 '23 at 06:29 — thinker3, Mar 22 '23 at 06:29
@thinker3 pypdf2 version is 1.26.0 – ConMan77 Mar 22 '23 at 07:12 — ConMan77, Mar 22 '23 at 07:12

score -2 · Answer 8 · edited Sep 15 '14 at 13:01

-2

If you're on Windows, this might work:

PDF Creator Pilot

There's also a whitepaper of a PDF creation and editing framework in Python. It's a little dated, but maybe can give you some useful info:

Using Python as PDF Editing and Processing Framework

edited Sep 15 '14 at 13:01

Community

1
1

answered Jul 24 '09 at 21:14

thedz

5,496
3
25
29

The white paper looks good but is a little light on code, and I don't really have the resource to implement a whole PDF framework myself! ;) – Frozenskys Jul 24 '09 at 21:22

score -3 · Answer 9 · answered Jul 24 '09 at 21:03

-3

You may have better luck breaking the problem down into converting PDF into an editable format, writing your changes, then converting it back into PDF. I don't know of a library that lets you directly edit PDF but there are plenty of converters between DOC and PDF for example.

answered Jul 24 '09 at 21:03

aehlke

15,225
5
36
45

2

Problem is that I only have the source in PDF (from a 3rd party) and PDF -> DOC -> PDF will lose a lot in the conversion. Also I need this to run on Linux so DOC may not be the best choice. – Frozenskys Jul 24 '09 at 21:08
I believe Adobe keeps PDF editing capability pretty closed and proprietary so that they can sell licenses for their better versions of Acrobat. Maybe you can find a way to automate the usage of Acrobat Pro to edit it, using some kind of macro interface. – aehlke Jul 24 '09 at 21:14
If the parts you want to write to are form fields, there are XML interfaces to editing them - otherwise I can't find anything. – aehlke Jul 24 '09 at 21:15
No I just wanted to add a few lines of text to each page. – Frozenskys Jul 24 '09 at 21:25

Add text to Existing PDF using Python

9 Answers9

Example for [Python 2.7]:

Example for Python 3.x:

Linked

Related