I'm trying to use pdftotext, but it won't import.
I'm running Windows 10 (64 bit) on a Lenovo IdeaPad S340, a work laptop.
Following the directions here and here (which were super helpful), I:
- Installed Microsoft Visual C++ Build Tools.
- Installed Anaconda.
- Got the latest version of Anaconda and updated it, using a separate Anaconda3 commands for each of these steps. I don't recall the commands, and haven't found them again.
- Updated Microsoft Visual 14.
- Used conda to install poppler via Anaconda3 command:
conda install -c conda-forge poppler
- Used pip to install pdftotext via Anaconda3 command:
pip install pdftotext
After that:
This happens in the Python 3.8 (32 bit) command prompt:
>>> import pdftotext
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pdftotext'
>>>
This happens in IDLE's Python 3.75 Shell (64 bit):
>>> import pdftotext
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
import pdftotext
ModuleNotFoundError: No module named 'pdftotext'
>>>
This happens in the Anaconda3 command prompt:
import pdftotext
'import' is not recognized as an internal or external command,
operable program or batch file.
This also happens in Anaconda3 command prompt:
pip install pdftotext
Requirement already satisfied: pdftotext in c:\programdata\anaconda3\lib\site-packages (2.1.4)
Does that mean it only runs in Python 2? How would I have checked that beforehand? If it does only run on Python 2, can you recommend a Python 3 package/module/library (what is the difference, btw?) for reading a PDF into a plain text file?
Thanks for your help!
Update:
I started over with a new user on the same machine and OS (the other user had a space in the name, so its filepath had a space, which can cause problems). I'm hitting the same problem.
I have Python 3.7.6 and 3.8.1. Python 3.7.6 is what shows up when checking the version through the Anaconda3 prompt python -V
(3.7.6.final.0 when using conda info
).
I also have:
- Anaconda Version "custom", Build py37_1.
- conda 4.8.2, py37_0, Channel conda-forge.
- poppler 0.84.0, h1affe6b_0, conda-forge.
- pdftotext 2.1.4, pypi_0, pypi.
I found Python here: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64.
I searched with my eyes all over the program files, user files, and on the Anaconda Navigator, and I ran a search of my entire C drive for 'pdftotext', and I didn't find anything about pdftotext.
Attempting from IDLE's Python 3.7.6 shell didn't work either.
Update:
I figured it out, sorta. pdftotext is not working as a Python import, as the example code in PyPI uses it. But, it does work as a command line tool that is part of Xpdf, with no additional installation after the steps.
I used the command in the Anaconda3 PowerShell command prompt:
pdftotext C:\filepath\file.pdf
It then created a text file with the same name and saved it in the same folder. There are additional options for the command outlined on the Xpdf page I linked above (like setting your file name).
Buuuut, this is not a satisfying solution. I'm able to take care of my current use-case task, with an additional step, but I'm still not able to call pdftotext from within a Python program.
Update:
If you install pdftotext using Anaconda and conda, then importing it seems to only work when you run it in the Python interpreter from within the Anaconda3 shell.
So, I had to switch to the Python interpreter mode in the Anaconda3 PowerShell first: python
Then, I could import pdftotext with no error: import pdftotext
It looked like this:
(user)> python
Python 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pdftotext
>>>