4

I want to extract texts from Pdf file using python 3.5.0 with the help of slate package on windows8.

Problem: Although I have installed slate package successfully still when i am trying to import slate there are certain errors.Please suggest what i am missing.

Errors:

Traceback (most recent call last): File "", line 1, in import slate File "C:\Users\name\AppData\Local\Programs\Python\Python35-32\lib\site-packages\slate-0.4.1-py3.5.egg\slate__init__.py", line 66, in from slate import PDF

ImportError: cannot import name 'PDF'

ketan
  • 19,129
  • 42
  • 60
  • 98
B Singh
  • 93
  • 1
  • 10

3 Answers3

3

You could try pdftotext (windows version) from the poppler library.

As a standalone program, it doesn't require Python. But I often use it from Python as a subprocess, like this:

import subprocess

args = ['pdftotext', '-layout', '-q', 'input.pdf', '-']
txt = subprocess.check_output(args, universal_newlines=True)
Roland Smith
  • 42,427
  • 3
  • 64
  • 94
2

slate depends on PDFMiner (Python 3 is not supported.)

You can try to install it with:

pip install PDFMiner

I went with installing pdfminer3k - pypi - but it did not respond well off the bat (and documentation wasn't good) so I looked a bit more and found this page for possible alternatives. Let me know if any of these satisfy.

ofer.sheffer
  • 5,417
  • 7
  • 25
  • 26
2

You can install pdfminer.six

pip install pdfminer.six

https://pypi.python.org/pypi/pdfminer.six/20160614

Bonson
  • 1,418
  • 4
  • 18
  • 38