Playing with PDF and images in my virtualenv, and I installed pdfminer-six
, pypdf2
, pytesseract
and pdf2image
with pip:
$python3 -m venv testenv
$source testenv/bin/activate
$pip3 install pdfminer-six
$pip3 install pypdf2
$pip3 install pdf2image
$pip3 install pytesseract
but then I ran into these importing errors:
$python3
>>> import pytesseract
Traceback (most recent call last):
File "/home/userx/test_dir/testenv/lib/python3.7/site-packages/pytesseract/pytesseract.py", line 22, in <module>
from PIL import Image
File "/home/userx/test_dir/testenv/lib/python3.7/site-packages/PIL/Image.py", line 94, in <module>
from . import _imaging as core
ImportError: cannot import name '_imaging' from 'PIL' (/home/userx/test_dir/testenv/lib/python3.7/site-packages/PIL/__init__.py)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/userx/test_dir/testenv/lib/python3.7/site-packages/pytesseract/__init__.py", line 1, in <module>
from .pytesseract import ( # noqa: F401
File "/home/userx/test_dir/testenv/lib/python3.7/site-packages/pytesseract/pytesseract.py", line 24, in <module>
import Image
ModuleNotFoundError: No module named 'Image'
and
>>> from pdf2image import convert_from_path
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/userx/test_dir/testenv/lib/python3.7/site-packages/pdf2image/__init__.py", line 5, in <module>
from .pdf2image import (
File "/home/userx/test_dir/testenv/lib/python3.7/site-packages/pdf2image/pdf2image.py", line 14, in <module>
from PIL import Image
File "/home/chengx/pdf_text/testenv/lib/python3.7/site-packages/PIL/Image.py", line 94, in <module>
from . import _imaging as core
ImportError: cannot import name '_imaging' from 'PIL' (/home/userx/test_dir/testenv/lib/python3.7/site-packages/PIL/__init__.py)
(That said, from PyPDF2 import PdfFileReader
worked well though.)
I then have Pillow
installed via pip too:
$python3 -m pip install Pillow
as an attempt to fix the issue, but the import errors didn't go away. Any reason why imaging is still missing? Thank you!
I'm on CentOS8, and this is my current setup:
(testenv) [~/test_dir] python --version
Python 3.7.5+
(testenv) [~/test_dir] pip -V
pip 20.2.3 from /home/userx/test_dir/testenv/lib/python3.7/site-packages/pip (python 3.7)
(testenv) [~/test_dir] pip list
Package Version
---------------- --------
cffi 1.14.3
chardet 3.0.4
cryptography 3.1.1
pdf2image 1.14.0
pdfminer.six 20200726
Pillow 7.2.0
pip 20.2.3
pycparser 2.20
PyPDF2 1.26.0
pytesseract 0.3.6
setuptools 41.2.0
six 1.15.0
sortedcontainers 2.2.2