I currently have the following function
def readFile(fileName):
text = ""
pdfFileObj = open(fileName, 'rt')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
num_pages = pdfReader.numPages
count = 0
while count < num_pages:
pageObj = pdfReader.getPage(count)
text += pageObj.extractText()
count += 1
pdfFileObj.close()
return text
But for most PDFs that I try this on it returns one long string without any spaces between words or sentences. Am I doing something wrong or is there a way to split up the words?