I am trying to extract words from a PDF into individual lines, but can only do this with Text files as demonstrated below.
Moreover, the rule is that I cannot convert PDF files to TXT then perform this operation. It must be done on PDF files.
with open('filename.txt','r') as f:
for line in f:
for word in line.split():
print(word)
If filename.txt has just "Hello World!", then this function returns:
Hello
World!
I need to do the same with searchable PDF files as well. Any help would be appreciated.