I am a complete beginner with Python. I literally started last weekend. I am using Python 3.
I am trying to read text from a pdf file. I first tried pyPDF2 following the instructions in Automate the Boring Stuff, but the result I got had no spaces between words and was therefore unusable. I then installed pdfminer3k by typing "pip install pdfminer3k" in the command line.
I then entered the following lines into the interpreter:
import pdfminer, os
base_path = ("C://Users//ross_")
my_file = os.path.join(base_path + "/" + "sample2.pdf")
log_file = os.path.join(base_path + "/" + "pdf_log.txt")
password = ""
extracted_text = ""
fp = open(my_file, "rb")
parser = PDFParser(fp)
document = PDFDocument(parser, password)
But the last line gave me this error message:
Traceback (most recent call last): File "", line 1, in document = PDFDocument(parser, password) NameError: name 'PDFDocument' is not defined
Does anyone have an idea why I get that error message? I thought PDFDocument would have been defined in the pdfminer module. More generally, how do figure out stuff like this? Isn't there a resource somewhere that explains how to use modules like pdfminer? Many thanks and apologies for my total ignorance.