0

The following code tries to edit part of text in a PDF file:

from PyPDF2 import PdfReader, PdfWriter

replacements = [("Failed", "Passed")]

pdf = PdfReader(open("2.pdf", "rb"))
writer = PdfWriter()

for page in pdf.pages:
    contents = page.get_contents().get_data()
    #print(contents) old contents
    for (a, b) in replacements:
        contents = contents.replace(str.encode(a), str.encode(b))
    #print(contents) new contents which has 'Passed' as new value
    page.get_contents().set_data(str(contents)) #Issue occurs here
    writer.add_page(page)

with open("2_modified.pdf", "wb") as f:
writer.write(f)

Keep getting into below issue:

Traceback (most recent call last):
File "/pdf_editor.py", line 14, in <module>
    page.get_contents().set_data(str(contents)) #Issue occurs here
File "/venv/lib/python3.9/site-packages/PyPDF2/generic/_data_structures.py", line 839, in set_data
    raise PdfReadError("Creating EncodedStreamObject is not currently supported")
PyPDF2.errors.PdfReadError: Creating EncodedStreamObject is not currently supported

I tried with solutions mentioned here which did not work, also found this github link which has a lable "bug" but with no further updates.

UPDATE:
I had tried the library which was in comments earlier did not pursue for two reasons:

  1. Seems not used widely
  2. Kept getting one or other issue last one being 'apply_redact_annotations' error

So wanted to know any other work around or any other good libraries to achieve this

Vinod
  • 376
  • 2
  • 11
  • 34
  • https://pymupdf.readthedocs.io/en/latest/about.html – Сергей Кох Mar 12 '23 at 15:43
  • Above library did not have much of documentation, however i tried with fitz still getting error "RuntimeError: Directory 'static/' does not exist" – Vinod Mar 13 '23 at 07:32
  • What is your actual question? A question has a question mark at the end. Everything seems pretty clear to me: You're trying to do something with pypdf that pypdf does not support. You already linked the right Github issue. – Martin Thoma Mar 15 '23 at 21:28
  • This question is actually a duplicate of https://stackoverflow.com/q/31703037/562769 – Martin Thoma Mar 15 '23 at 21:30
  • @MartinThoma: Updated the question and the other SO link – Vinod Mar 16 '23 at 11:36

1 Answers1

1

I am answering the question in lieu of the title. While PyPDF2 (now merged with PyPDF) can decode encoded stream objects for their data on the fly, it does not support implicit encoding. While it is probably possible to create encoded streams explicitly, I find it easier just to work on fully decoded documents. I like using qpdf --qdf in.pdf uncompressed.pdf.

By the way, "encoded" means "compressed" ("Deflate" is popular).

Hermann
  • 604
  • 7
  • 23