I have found a few solutions here and elsewhere, but none of them has worked. Context: I am trying to get pdf2txt going on my Pop!_OS 22.04 LTS system. I am using Python 3.10.6, with no other versions present. The command-line states that it requires python3-pdfminer
to work, which I installed with apt. Output states that there is no module named 'pdfminer.high_level'. This comment here notes that it's a part of pdfminer.six which can be installed using pip, using a dash instead of a dot if it's inside a virtual environment.
$ python3 -m pip install pdfminer.six
states that requirements are already satisfied. To be sure, I also switched to a virtualenv and installed it there:
$ pip install pdfminer-six
Running pdf2txt
in both cases results in the same error, i.e.
File "/usr/bin/pdf2txt", line 9, in <module>
import pdfminer.high_level
ModuleNotFoundError: No module named 'pdfminer.high_level'
I then tried to uninstall and reinstall pdfminer.six
, first on the system wide version. python3 -m pip uninstall pdfminer.six
(or just pip3
) was not allowed, so I made a judgment of error and used sudo. Reinstalling now shows:
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: pdfminer.six in /usr/lib/python3/dist-packages (-VERSION-)
so I am not sure if it is properly installed. So I tried in the virtualenv where there were no problems. But the same error remains.
I reinstalled python3-pdfminer
but without success.
In the virtualenv, I found:
./lib/python3.10/site-packages/pdfminer/high_level.py
./lib/python3.10/site-packages/pdfminer/__pycache__/high_level.cpython-310.pyc
I then created a test Python file that imports pdfminer.high_level
and ran it, with no problem. I then did the same outside the virtualenv. pdfminer
is imported correctly, but it can't import pdfminer.high_level
. So I found the following:
- When I uninstall
pdfminer.six
, it looks in/usr/local/lib/python3.10/dist-packages
and removes it from there. - When I install
pdfminer.six
, it looks in/usr/lib/python3/dist-packages
. Here, thehigh_level
package is present. But the system always looks in/usr/local/lib/python3.10/dist-packages
, so the package is never found.
So, I think that I found the cause of the problem. Running pdf2txt
doesn't work in the virtualenv because it's still a file in /usr/bin
that will look for a system-wide version. I suppose that I can update a system environment path to point to /usr/lib/python3/dist-packages
, and solve it like this (pdf2txt
is actually not binary so I can append to sys.path
). But why has this discrepancy occurred in the first place? And what is the proper way to deal with it? After all, there is a reason why some packages are installed in different locations.
Many thanks.
EDIT: Adding to sys.path doesn't work, but adding to PYTHONPATH does. Not sure if this is something that I should watch out for in the future, or just a result of a Python executable installed in /usr/bin messing up with the idea of using a virtualenv in the first place.