Python: Numbering Pages in a PDF using PyPDF2 and io

Question

So I am trying to retrospectively add a page numbering to a PDF file. I don't understand how this works. I copied the code together from here and here. I keep a problem I can't seem to fix on my own, probably because I don't understand what is happening even after reading the PyPDF2 documentation.

from PyPDF2 import PdfFileWriter, PdfFileReader
import io
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4


packet = io.BytesIO()
can = canvas.Canvas(packet, pagesize=A4)    
can.drawString(10, 100, "Page" + str(15)) #just a random test number
can.save()
packet.seek(0)

watermark = PdfFileReader(packet)
watermark_page = watermark.getPage(0)

pdf = PdfFileReader('in.pdf')
pdf_writer = PdfFileWriter()

for page in range(pdf.getNumPages()):

    pdf_page = pdf.getPage(page)
    pdf_page.mergePage(watermark_page)
    pdf_writer.addPage(pdf_page)

with open('out.pdf', 'wb') as fh:
    pdf_writer.write(fh)

This works fine. However, I would like to give every page a different number. So I changed the for loop to this:

from PyPDF2 import PdfFileWriter, PdfFileReader
import io
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4

packet = io.BytesIO()

pdf = PdfFileReader('in.pdf')
pdf_writer = PdfFileWriter()

for page in range(pdf.getNumPages()):

    can = canvas.Canvas(packet, pagesize=A4)


    can.drawString(10, 200, "Page " + str(page) )
    can.save()
    packet.seek(0)
    watermark = PdfFileReader(packet)
    watermark_page = watermark.getPage(0)



    pdf_page = pdf.getPage(page)
    pdf_page.mergePage(watermark_page)
    pdf_writer.addPage(pdf_page)

with open('out.pdf', 'wb') as fh:
    pdf_writer.write(fh)

This does not work.

I get:

Traceback (most recent call last):

  File "<ipython-input-44-c6a76740be9f>", line 1, in <module>
    runfile('//DIR/pdftest.py', wdir='//DIR')

  File "C:\Program Files (x86)\Anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "C:\Program Files (x86)\Anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "//DIR/pdftest.py", line 55, in <module>
    watermark = PdfFileReader(packet)

  File "C:\Program Files (x86)\Anaconda\lib\site-packages\PyPDF2\pdf.py", line 1084, in __init__
    self.read(stream)

  File "C:\Program Files (x86)\Anaconda\lib\site-packages\PyPDF2\pdf.py", line 1901, in read
    raise utils.PdfReadError("Could not find xref table at specified location")

PdfReadError: Could not find xref table at specified location

A bit of help understanding as well as fixing this would be greatly appreciated.

Thank you!

This *"`watermark = PdfFileReader(packet)`"*, is invalid. You read from empty ` packet = io.BytesIO()` — stovfl, Dec 20 '18 at 09:07
Your answer gave me the idea to initialize `packet = io.BytesIO()` in ever loop iteration which does the trick. I don't understand why though. Because the way I understand your answer is: since the object is empty it shouldn't work in the first place. — Stefan, Dec 20 '18 at 12:17

Python: Numbering Pages in a PDF using PyPDF2 and io

0 Answers0

Linked