Please do not use "tika" for an answer. I have already tried answers from this question:
How to extract text from a PDF file?
I have this PDF file, https://drive.google.com/file/d/1aUfQAlvq5hA9kz2c9CyJADiY3KpY3-Vn/view?usp=sharing , and I would like to copy the text.
import PyPDF2
pdfFileObject = open('C:\\Path\\To\\Local\\File\\Test_PDF.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObject)
count = pdfReader.numPages
for i in range(count):
page = pdfReader.getPage(i)
print(page.extractText())
The output is "Date Submitted: 2019-10-21 16:03:36.093 | Form Key: 5544" which is only part of the text. The next line of text starts with "Exhibit A to RFA...."