3

I'm trying to use pdftotext, but it won't import.

I'm running Windows 10 (64 bit) on a Lenovo IdeaPad S340, a work laptop.

Following the directions here and here (which were super helpful), I:

  1. Installed Microsoft Visual C++ Build Tools.
  2. Installed Anaconda.
  3. Got the latest version of Anaconda and updated it, using a separate Anaconda3 commands for each of these steps. I don't recall the commands, and haven't found them again.
  4. Updated Microsoft Visual 14.
  5. Used conda to install poppler via Anaconda3 command: conda install -c conda-forge poppler
  6. Used pip to install pdftotext via Anaconda3 command: pip install pdftotext

After that:

This happens in the Python 3.8 (32 bit) command prompt:

>>> import pdftotext
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pdftotext'
>>>

This happens in IDLE's Python 3.75 Shell (64 bit):

>>> import pdftotext
Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    import pdftotext
ModuleNotFoundError: No module named 'pdftotext'
>>> 

This happens in the Anaconda3 command prompt:

import pdftotext
'import' is not recognized as an internal or external command,
operable program or batch file.

This also happens in Anaconda3 command prompt:

pip install pdftotext
Requirement already satisfied: pdftotext in c:\programdata\anaconda3\lib\site-packages (2.1.4)

Does that mean it only runs in Python 2? How would I have checked that beforehand? If it does only run on Python 2, can you recommend a Python 3 package/module/library (what is the difference, btw?) for reading a PDF into a plain text file?

Thanks for your help!

Update:

I started over with a new user on the same machine and OS (the other user had a space in the name, so its filepath had a space, which can cause problems). I'm hitting the same problem.

I have Python 3.7.6 and 3.8.1. Python 3.7.6 is what shows up when checking the version through the Anaconda3 prompt python -V (3.7.6.final.0 when using conda info).

I also have:

  • Anaconda Version "custom", Build py37_1.
  • conda 4.8.2, py37_0, Channel conda-forge.
  • poppler 0.84.0, h1affe6b_0, conda-forge.
  • pdftotext 2.1.4, pypi_0, pypi.

I found Python here: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64.

I searched with my eyes all over the program files, user files, and on the Anaconda Navigator, and I ran a search of my entire C drive for 'pdftotext', and I didn't find anything about pdftotext.

Attempting from IDLE's Python 3.7.6 shell didn't work either.

Update:

I figured it out, sorta. pdftotext is not working as a Python import, as the example code in PyPI uses it. But, it does work as a command line tool that is part of Xpdf, with no additional installation after the steps.

I used the command in the Anaconda3 PowerShell command prompt:

pdftotext C:\filepath\file.pdf

It then created a text file with the same name and saved it in the same folder. There are additional options for the command outlined on the Xpdf page I linked above (like setting your file name).

Buuuut, this is not a satisfying solution. I'm able to take care of my current use-case task, with an additional step, but I'm still not able to call pdftotext from within a Python program.

Update:

If you install pdftotext using Anaconda and conda, then importing it seems to only work when you run it in the Python interpreter from within the Anaconda3 shell.

So, I had to switch to the Python interpreter mode in the Anaconda3 PowerShell first: python

Then, I could import pdftotext with no error: import pdftotext

It looked like this:

(user)> python
Python 3.7.6 (default, Jan  8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pdftotext
>>> 
Kaleb Coberly
  • 420
  • 1
  • 4
  • 19
  • Can you share the exact steps you took to install the library, including ones involving conda? When you tried importing the library, were you using the Python install from the correct environment? – AMC Jan 29 '20 at 05:31
  • did you try: `pip3 install pdftotext`? – SuperKogito Jan 29 '20 at 08:58
  • @SuperKogito, pip3 is not recognized as a command. – Kaleb Coberly Jan 29 '20 at 13:49
  • @AMC, I didn't want to rewrite the directions I linked to. I followed those steps, in the order I outlined above. – Kaleb Coberly Jan 29 '20 at 13:52
  • 2
    It looks like you installed the library in one python version, while trying to call it in another. Well you can try to install the lib from the ipython IDLE using `!pip install pdftotext` (this usually works for me on Spyder) but I wouldn't advise that hack. The best thing you can do is to first figure out what python versions do you have & where are they. You can refer to [this](https://stackoverflow.com/questions/48342098/how-to-check-python-anaconda-version-installed-on-windows-10-pc) to check the versions. Feel free to post the output, it should help us better understand the issue. – SuperKogito Jan 29 '20 at 14:10
  • Okay, finally able to return to this project! I started over with a new user on the same machine and OS (the other user had a space in the name, so its filepath had a space, which can cause problems). I'm hitting the same problem. I have Python 3.7.6 and 3.8.1. Python 3.7.6 is what shows up when checking the version through the Anaconda3 prompt (3.7.6.final.0 when using ```conda info```). I also have: Anaconda Version "custom", Build py37_1; conda 4.8.2, py37_0, Channel conda-forge; poppler 0.84.0, h1affe6b_0, conda-forge; and, pdftotext 2.1.4, pypi_0, pypi. – Kaleb Coberly Feb 05 '20 at 20:27
  • I found Python here: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64. But, I searched all over the program files, user files, and on the Anaconda Navigator, and I didn't find anything about pdftotext. – Kaleb Coberly Feb 05 '20 at 20:29
  • 1
    @SuperKogito, yep, I never tried running the Python interpreter from the Anaconda3 shell. That was it all along. – Kaleb Coberly Feb 11 '20 at 09:04

3 Answers3

1

pdftotext is not a module but a command. So you can do the following

import os

file_path = "C:\documents\mypdf.pdf"

# writing data in variable
text = os.popen("pdftotext {}".format(file_path)).read()

# writing data in file
os.system("pdftotext {} {}".format(file_path, "data.txt"))
Artyom Vancyan
  • 5,029
  • 3
  • 12
  • 34
  • thanks! I'm going to try that. Incidentally, I just came back to update because I realized the problem all along was that I never tried using the Anaconda3 shell as a Python interpreter. So, typing ```python``` into the command line to switch into the Python interpreter mode, then ```import pdftotext``` returns no errors so far. It definitely is a module that you import and call on within your code, as you can see at https://pypi.org/project/pdftotext/. – Kaleb Coberly Feb 11 '20 at 08:39
  • So I recommend you download Linux on your PC and avoid such exceptions – Artyom Vancyan Feb 11 '20 at 22:38
1

I had the same problem but after performing the following, it worked like charm!

sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev

pip install pdftotext
Flair
  • 2,609
  • 1
  • 29
  • 41
  • You seem to have missed the part where it says "Windows 10". Unless you know a way to run apt under Windows? – Sylvain Jun 30 '23 at 11:12
0

Okay, I figured it out! If you install pdftotext using Anaconda and conda, then importing it seems to only work when you run it in the Python interpreter from within the Anaconda3 shell.

So, I had to switch to the Python interpreter mode in the Anaconda3 PowerShell first: python

Then, I could import pdftotext with no error: import pdftotext

It looked like this:

(user)> python
Python 3.7.6 (default, Jan  8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pdftotext
>>> 

Ooor, a second partial solution is that it works as a command line tool that is part of Xpdf.

I needed no additional installation after the steps taken in the problem post. I used the command in the Anaconda3 PowerShell command prompt:

pdftotext C:\filepath\file.pdf

It then created a text file with the same name and saved it in the same folder. There are additional options for the command outlined on the Xpdf page I linked above (like setting your file name).

The problem with the second solution of using it from the command line is that if you want to do something with the text file afterwards, you have to run another command or script. All it does is read it to a file.

Kaleb Coberly
  • 420
  • 1
  • 4
  • 19
  • Hi I am installing poppler via conda, but it doesnt import poppler saying module not found. Did u see the same?. Here is the details https://stackoverflow.com/questions/61488601/unable-to-import-poppler-even-after-installing-in-conda – Baktaawar Apr 28 '20 at 20:28
  • @Baktaawar, I'm posting my response in your post. – Kaleb Coberly May 01 '20 at 03:42