Extracting text of Pdf on windows 8 using python 3.5.0

Question

I want to extract texts from Pdf file using python 3.5.0 with the help of slate package on windows8.

Problem: Although I have installed slate package successfully still when i am trying to import slate there are certain errors.Please suggest what i am missing.

Errors:

Traceback (most recent call last): File "", line 1, in import slate File "C:\Users\name\AppData\Local\Programs\Python\Python35-32\lib\site-packages\slate-0.4.1-py3.5.egg\slate__init__.py", line 66, in from slate import PDF

ImportError: cannot import name 'PDF'

score 3 · Answer 1 · answered Dec 28 '15 at 20:44

You could try pdftotext (windows version) from the poppler library.

As a standalone program, it doesn't require Python. But I often use it from Python as a subprocess, like this:

import subprocess

args = ['pdftotext', '-layout', '-q', 'input.pdf', '-']
txt = subprocess.check_output(args, universal_newlines=True)

score 2 · Answer 2 · answered Dec 28 '15 at 20:35

slate depends on PDFMiner (Python 3 is not supported.)

You can try to install it with:

pip install PDFMiner

I went with installing pdfminer3k - pypi - but it did not respond well off the bat (and documentation wasn't good) so I looked a bit more and found this page for possible alternatives. Let me know if any of these satisfy.

score 2 · Answer 3 · answered Feb 16 '17 at 10:52

2

You can install pdfminer.six

pip install pdfminer.six

https://pypi.python.org/pypi/pdfminer.six/20160614

answered Feb 16 '17 at 10:52

Bonson

1,418
4
18
38

Extracting text of Pdf on windows 8 using python 3.5.0

3 Answers3