35

How can I install pdftotext properly?

I'm getting the error message below when installing pdftotext in Python 3.6. I also tried to install the package manually by downloading the zip file but still got the same error.

  pdftotext/pdftotext.cpp(4): fatal error C1083: Cannot open include file: 'poppler/cpp/poppler-document.h': No such file or directory
  error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\x86_amd64\\cl.exe' failed with exit status 2     
smci
  • 32,567
  • 20
  • 113
  • 146
mtryingtocode
  • 939
  • 3
  • 13
  • 26
  • 1
    You need poppler installed. I'm not sure if windows is supported for pdftotext. The github page only lists install dependencies for linux. – Håken Lid Aug 28 '17 at 07:13

7 Answers7

72

I found some help in the Readme.md file in the pdftotext package :

1) Install OS Dependencies :

on Debian, Ubuntu, and friends:

sudo apt-get update
sudo apt-get install build-essential libpoppler-cpp-dev pkg-config python-dev

on Fedora, Red Hat, and friends:

sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python-devel redhat-rpm-config

2) Do the normal install :

pip install pdftotext

and it worked for me.

herve-guerin
  • 2,827
  • 20
  • 25
29

I've been trying to figure out how to install pdftotext on Win10 for a few days. Internet searches have given me nothing. So for those who need to know, here's installing pdftotext on Win10 with Anaconda. YMMV.

Install Anaconda Python. There are many articles on installing Anaconda, so I won't explore that here.

Try to run pip install pdftotext, you will get an error that the Microsoft Visual C++ is required.

Navigate in a browser to http://visualstudio.microsoft.com/downloads. Under the Tools for Visual Studio 2019 tab download the Build Tools for Visual Studio 2019. You’ll then install the tools by checking the C++ build tools option box and clicking Install.

You should now get the pip install to move past the VC++ error. Unfortunately you’ll now get the error “Cannot open include file: ‘poppler/cpp/poppler-document.h’. This is because you’re missing the poppler libraries.

Head back to the internets! You’ll need poppler for windows. At the time of this writing, your best option is http://blog.alivate.com.au/poppler-windows. Grab the latest binary, and uncompress it. If you look at the error, pip is looking for the header file at {Anaconda3 directory}\include\poppler\cpp\poppler-document.h. So look in the archive you just unzipped. In the include folder, you’ll see a poppler directory. If you go down into the cpp directory in there you’ll find the poppler-document.h file.

I copied the entire poppler directory into the Anaconda3\include folder, so do that.

If you try to run pip install again, you'll still get a ton of errors! But these are not any of the errors that you saw previously, instead this error is looking for a missing linked library, poppler-cpp.lib. A search through Conda installs on another machine found this file in the poppler package. So

conda install -c conda-forge poppler

Which will install our poppler-cpp.lib file. Then we can copy the file from its home at {Anaconda3 directory}\Library\lib\poppler-cpp.lib and paste it where pdftotext is expecting it at {Anaconda3 directory}\libs.

If we do a pip install pdftotext again, there it is! I’m sure someone will find a way to refine this a bit, but for now we have a working pdftotext Python library on Win10.

These directions can be found, with screenshots, at my blog https://coder.haus/2019/09/27/installing-pdftotext-through-pip-on-windows-10/

Jason Woods
  • 519
  • 5
  • 7
  • Thank you so much for the detailed instructions on your blog, I followed the steps and was able to install the lib on win10 x64. Would like to add just one thing, while installing C++ build tools, earlier I had unchecked all the 4 optional components, but it did not work w/o them, so would be worth mentioning in the blog that they too are required. Their exact names: MSVC v142 - VS 2019 C++ x64/x86 build tools, Windows 10 SDK (10.0.18362.0), C++ CMake tools for Windows, Testing tools core features - Build Tools – Harshad Vyawahare Oct 14 '19 at 11:11
  • 1
    Thanks for the feedback Harshad and glad it worked for you! I'll take a look at the instructions and get them updated. As a note, there was a PR merged into the project to make installation easier on Windows that will make it to PyPi eventually. The maintainer of the project is also looking to generate pre-compiled binaries for Windows, with no expected timeline. – Jason Woods Oct 22 '19 at 10:24
  • Hey mate, thanks a lot for those steps, everything has worked beautifully up to the step conda install -c conda-forge poppler. Collecting package metadata (current_repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: - Found conflicts! Looking for incompatible packages... – Ming Xuan Nov 11 '19 at 07:46
  • 1
    It then proceeds to check a bunch of stuff, it's been running for 10 hours straight and it's still not done. Any idea of what's going on? – Ming Xuan Nov 11 '19 at 07:49
  • 24 hours later, the check ultimately failed and the install too! UnsatisfiableError: The following specifications were found to be incompatible with each other: (longest list I've ever seen follows). TL;DR: does not work for me. If anyone managed to get past that step, let me know! – Ming Xuan Nov 11 '19 at 21:40
  • Hi @MingXuan this is often due to another module in conflict either with poppler, or in conflict with the version of Python you have. The first few lines tell you which module is in conflict, like this UnsatisfiableError: The following specifications were found to be in conflict: - enum34 -> python 2.6*|2.7*|3.3*|3.5* - python ==3.6.0 Take a look at this SO post, it has some ways you can work through this with initializing a new conda environment https://stackoverflow.com/questions/48589141/anaconda-unsatisfiableerror-the-following-specifications-were-found-to-be-in – Jason Woods Nov 18 '19 at 17:31
  • I think this is awesome but is there any workaround for being unable to install Build Tools for Visual Studio 2019? I am working with a lock down laptop and we're not allowed to install this software. – clover Dec 31 '19 at 02:29
  • Hi @clover short of asking permission to install, options are slim. The maintainer of the pdftotext python package hasn't set up CI for Windows yet, so a precompiled package isn't available. From a branch on their github repo, it looks like they're working on this. If you have VM software like HyperV, VirtualBox, Parallels, etc, you can download a Windows VM directly from Microsoft, install your tools and map a drive to your code on the local machine. I do this on my corporate machine. Microsoft makes the VM available at https://developer.microsoft.com/en-us/windows/downloads/virtual-machines – Jason Woods Dec 31 '19 at 13:24
  • Works well! Thanks – Alessandro Corradini Jan 02 '20 at 20:14
  • 1
    This worked like a charm for me at win10, thanks! – mrbTT Mar 09 '21 at 14:13
21

Below command solved the problem for me.

sudo apt-get install libpoppler-cpp-dev

https://blog.droidzone.in/2018/05/01/install-pdftotext-python-extension-error/

Ajay Singh
  • 1,251
  • 14
  • 17
14

And for macOS:

brew install poppler

brew install pkg-config poppler python

Max S.
  • 3,704
  • 2
  • 13
  • 34
Dasma
  • 1,023
  • 13
  • 34
2

Simple solution for windows:

  1. Download the poppler zip file from http://blog.alivate.com.au/wp-content/uploads/2018/10/poppler-0.68.0_x86.7z
  2. Download and install visual studio tools from https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=BuildTools&rel=15
  3. Set the folder \poppler-0.68.0\bin to path in the environmental variables.

Thats it. Restart your environment eg could be jupyter notebook, vscode etc. Enjoy

West
  • 2,350
  • 5
  • 31
  • 67
  • 1
    The blog is no longer maintained. Download here: https://github.com/oschwartz10612/poppler-windows – Owen Schwartz Jun 27 '20 at 22:03
  • 1
    the OP was asking about windows. this is the best answer. – Jeroen Dec 06 '21 at 10:39
  • 1
    The only additional things I had to do to make this work: 1) copy the contents of /Library/lib/ to your /Libs folder and 2) copy contents of /Library/include/ (a poppler folder) to /include/ – Jeroen Dec 06 '21 at 12:57
0

For Ubuntu users

sudo apt-get install libpoppler58=0.41.0-0ubuntu1 libpoppler-dev libpoppler-cpp-dev

worked for me

Sami
  • 8,168
  • 9
  • 66
  • 99
0

To install pdftotext on Windows 10, I tried to follow Jason Woods' answer.

I want to add to this answer, that it is necessary to have the "C++ Desktop applications development" package installed in Visual Studio.

Make sure to install the "C++ Build Tools" as well, as mentioned in Jason Woods' answer.

Follow the rest of his answer. Quick summary:

  • install Anaconda Python
  • in the Anaconda Prompt, type: conda install -c conda-forge poppler
  • now install the pdftotext package: pip install pdftotext

It worked for me. Thank you.

Martin Graupner
  • 103
  • 1
  • 2
  • 8