I am extracting the text from a .pdf file using PyPDF2 package. I am getting output but not as in it's desired form. I am unable to find where's the problem?
The code snippet is as follows:
import PyPDF2
def Read(startPage, endPage):
global text
text = []
cleanText = " "
pdfFileObj = open('F:\\Pen Drive 8 GB\\PDF\\Handbooks\\book1.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
num_pages = pdfReader.numPages
print(num_pages)
while (startPage <= endPage):
pageObj = pdfReader.getPage(startPage)
text += pageObj.extractText()
startPage += 1
pdfFileObj.close()
for myWord in text:
if myWord != '\n':
cleanText += myWord
text = cleanText.strip().split()
print(text)
Read(3, 3)
The output which I am getting at present is attached for the reference and which is as follows:
Any help is highly appreciated.