0

How can I get the content of pdf file line by line in python? I have searched in stackoverflow but could not find any good answer. Notes: pyPdf gives assertion erro, if possible something with slate and pdfminer.

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
user873286
  • 7,799
  • 7
  • 30
  • 38

1 Answers1

0

from the command line:python /path/to/pdf2txt.py -o text.txt /path/to/yourpdf.pdf

You can then just take the text file it makes and use for line in file:

If you want to be efficient you would have to change pdf2txt.py, and have outfp be a python iostring, which would avoid the making a file and then reading from it.

apple16
  • 1,137
  • 10
  • 13