0
import PyPDF2
from PyDF2 import PdfFileReader, PdfFileWriter


file_path="sample.pdf"

pdf = PdfFileReader(file_path)


with open("sample.pdf", "w") as f:'

for page_num in range(pdf.numPages):
   
   pageObj = pdf.getPage(page_num)



   try:
       txt = pageObj.extractText()
       txt = DocumentInformation.author

   except:
       pass

   else:

       f.write(txt)
f.close()

Error Received: ModuleNotFoundError: No module named 'PyPDF2'

Writing my first ever script where I want to scan in a PDF then extract the text and write it to a txt file. I was trying to use pyPDF2 but I'm not sure how to use it in a script like this.

EDIT: I had success importing the os & sys like so.

import os
import sys

1 Answers1

1

There are multiple issues:

  1. from PyDF2 import ...: A typo. You meant PyPDF2 instead of PyDF2
  2. PdfFileWriter was imported, but never used (side-note: It's PdfReader and PdfWriter in the latest version of PyPDF2)
  3. with open("sample.pdf", "w") as f:': A syntax error
  4. Lacking indentation of the next lines
  5. Side-note: Did you know that you can simply write for page in pdf.pages?
  6. DocumentInformation.author is wrong. I guess you meant pdf.metadata.author
  7. You overwrite the txt variable - I don't understand why you don't use it before you re-assign it.

Maybe this is what you want:

from PyPDF2 import PdfReader

def get_text(pdf_file_path: str) -> str:
    text = ""
    reader = PdfReader(pdf_file_path)
    for page in reader.pages:
        text += page.extract_text()
    return text


text = get_text("example.pdf")

with open("example.txt", "w") as f:
    f.write(text)

Installation issues

In case you have installation issues, maybe the docs on installing PyPDF2 can help you?

If you execute your script in the console as python your_script_name.py you might want to check the output of

python -c "import PyPDF2; print(PyPDF2.__version__)"

That should show your PyPDF2 version. If it doesn't, it the Python environment you're using doesn't have PyPDF2 installed. Please note that your system might have arbitrary many Python environments.

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958