1

I am trying to replace text strings in a PDF file using the Python code below.

import PyPDF2
reader = PyPDF2.PdfFileReader('document.pdf', strict=True, warndest=None, overwriteWarnings=True)
writer = PyPDF2.PdfFileWriter()
replacements = {'old' : 'new'}

P = reader.getNumPages()
for p in range(P):
    page = reader.getPage(p)
    contents = page.getContents()
    bdata = contents.getData()
    ddata = bdata.decode('utf-8') #decoded data (string)  
    for key in replacements.keys():
        ddata = ddata.replace(key, replacements[key])
    
    contents.setData(ddata.encode('utf-8')) #Error occurs here
    
    #page.setContents(contents)
    writer.addPage(page)

with open("result.pdf", 'wb') as f:
    writer.write(f)

The problem is that contents.setData raises PdfReadError: Creating EncodedStreamObject is not currently supported.

Can anybody think of a workaround?

P.S. Applying the method described here did create a new PDF file but without replacements.

  • 2
    As an aside: using UTF-8 to decode the content stream is a sure way to damage the stream data in many pdfs. – mkl Jan 06 '21 at 22:02

1 Answers1

1

As explained here, this isn't a good idea. You might consider building an HTML of the page you want, then use wkhtmltopdf to convert it into PDF

ishahak
  • 6,585
  • 5
  • 38
  • 56