0

I have found a few solutions here and elsewhere, but none of them has worked. Context: I am trying to get pdf2txt going on my Pop!_OS 22.04 LTS system. I am using Python 3.10.6, with no other versions present. The command-line states that it requires python3-pdfminer to work, which I installed with apt. Output states that there is no module named 'pdfminer.high_level'. This comment here notes that it's a part of pdfminer.six which can be installed using pip, using a dash instead of a dot if it's inside a virtual environment.

$ python3 -m pip install pdfminer.six

states that requirements are already satisfied. To be sure, I also switched to a virtualenv and installed it there:

$ pip install pdfminer-six

Running pdf2txt in both cases results in the same error, i.e.

  File "/usr/bin/pdf2txt", line 9, in <module>
    import pdfminer.high_level
ModuleNotFoundError: No module named 'pdfminer.high_level'

I then tried to uninstall and reinstall pdfminer.six, first on the system wide version. python3 -m pip uninstall pdfminer.six (or just pip3) was not allowed, so I made a judgment of error and used sudo. Reinstalling now shows:

Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: pdfminer.six in /usr/lib/python3/dist-packages (-VERSION-)

so I am not sure if it is properly installed. So I tried in the virtualenv where there were no problems. But the same error remains.

I reinstalled python3-pdfminer but without success.

In the virtualenv, I found:

./lib/python3.10/site-packages/pdfminer/high_level.py
./lib/python3.10/site-packages/pdfminer/__pycache__/high_level.cpython-310.pyc

I then created a test Python file that imports pdfminer.high_level and ran it, with no problem. I then did the same outside the virtualenv. pdfminer is imported correctly, but it can't import pdfminer.high_level. So I found the following:

  • When I uninstall pdfminer.six, it looks in /usr/local/lib/python3.10/dist-packages and removes it from there.
  • When I install pdfminer.six, it looks in /usr/lib/python3/dist-packages. Here, the high_level package is present. But the system always looks in /usr/local/lib/python3.10/dist-packages, so the package is never found.

So, I think that I found the cause of the problem. Running pdf2txt doesn't work in the virtualenv because it's still a file in /usr/bin that will look for a system-wide version. I suppose that I can update a system environment path to point to /usr/lib/python3/dist-packages, and solve it like this (pdf2txt is actually not binary so I can append to sys.path). But why has this discrepancy occurred in the first place? And what is the proper way to deal with it? After all, there is a reason why some packages are installed in different locations.

Many thanks.

EDIT: Adding to sys.path doesn't work, but adding to PYTHONPATH does. Not sure if this is something that I should watch out for in the future, or just a result of a Python executable installed in /usr/bin messing up with the idea of using a virtualenv in the first place.

JozuaK
  • 1
  • 3
  • Yes and thanks, but only in the sense of what I already knew, i.e. how to add to sys.path (not working in my case) or PYTHONPATH (works). It doesn't answer my question on why it happens in the first place, but I guess that's fine, I just needed to learn about what to look out for in the future. – JozuaK Oct 17 '22 at 10:53

1 Answers1

0

Sometimes this issue happens when you have same name file in your working dir. So please check that first like if you have a file name "pdfminer.py". If that is not the case then I usually try the previous version of lib. you can try installing previous version with

pip install pdfminer.six==20220506

  • Not relevant in my case, but thanks for the suggestion (definitely appropriate in some circumstances). See edit in OP. – JozuaK Oct 17 '22 at 10:55