75

I'm trying to use pdf2image and it seems I need something called poppler :

(sum_env) C:\Users\antoi\Documents\Programming\projects\summarizer>python ocr.py -i fr13_idf.pdf
Traceback (most recent call last):
  File "c:\Users\antoi\Documents\Programming\projects\summarizer\sum_env\lib\site-packages\pdf2image\pdf2image.py", line 165, in __page_count
    proc = Popen(["pdfinfo", pdf_path], stdout=PIPE, stderr=PIPE)
  File "C:\Python37\lib\subprocess.py", line 769, in __init__
    restore_signals, start_new_session)
  File "C:\Python37\lib\subprocess.py", line 1172, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "ocr.py", line 53, in <module>
    pdfspliterimager(image_path)
  File "ocr.py", line 32, in pdfspliterimager
    pages = convert_from_path("document-page%s.pdf" % i, 500)
  File "c:\Users\antoi\Documents\Programming\projects\summarizer\sum_env\lib\site-packages\pdf2image\pdf2image.py", line 30, in convert_from_path
    page_count = __page_count(pdf_path, userpw)
  File "c:\Users\antoi\Documents\Programming\projects\summarizer\sum_env\lib\site-packages\pdf2image\pdf2image.py", line 169, in __page_count
    raise Exception('Unable to get page count. Is poppler installed and in PATH?')
Exception: Unable to get page count. Is poppler installed and in PATH?

I tried this link but it the thing to download didn't solved my problem.

A_Arnold
  • 3,195
  • 25
  • 39
Revolucion for Monica
  • 2,848
  • 8
  • 39
  • 78
  • 3
    Iggy, I have noticed that many other people are having similar issues with Poppler on Windows. So, I wrote a short article on how to resolve this using WSL. You can find the article here (Poppler on Windows): https://medium.com/@matthew_earl_miller/poppler-on-windows-179af0e50150 – Matthew E. Miller Jan 09 '20 at 20:27

15 Answers15

71

pdf2image is only a wrapper around poppler (not propeller!), to use the module you need to have poppler-utils installed on your machine and in your path.

The procedure is linked in the project's README in the "How to install" section.

Belval
  • 1,236
  • 10
  • 17
34

1st of all Download Poppler from here here,Then extract it.In the code section just add poppler_path=r'C:\Program Files\poppler-0.68.0\bin'(for eg.) like below

from pdf2image import convert_from_path
images = convert_from_path("mypdf.pdf", 500,poppler_path=r'C:\Program Files\poppler-0.68.0\bin')
for i, image in enumerate(images):
    fname = 'image'+str(i)+'.png'
    image.save(fname, "PNG")

Now its done.With this trick no need to add Environmental Variables.Let me know if you have any problem.

Rajkumar
  • 530
  • 4
  • 9
  • Alternatively, you can add the poppler_path as above to your windows path environment in the system settings. Don't forget to reboot afterwards. This way, you do not need to add it to each new project. – 00zetti Apr 29 '21 at 12:47
  • @Rajkumar What does the number 500 refer to? – YasserKhalil Mar 02 '22 at 23:20
11

These pdf2image and pdftotext library backend requierment is Poppler, so you have to install

'conda install -c conda-forge poppler '

then the error will be resolved. and if still it won't work for you then you can follow http://blog.alivate.com.au/poppler-windows/ to install this library.

8

It is poppler which is not installed properly. Using this you can get correct package for installation.

sudo apt-get install poppler-utils

deepak sen
  • 437
  • 4
  • 13
8

Poppler in path for pdf2image

While working with pdf2image there are dependency that needs to be satisfied:

  1. Installation of pdf2image

    pip install pdf2image

  2. Installation of python-dateutil

    pip install python-dateutil

  3. Installation of Poppler

  4. Specifying Poppler path in environment variable (system path)

Installing Poppler on Windows

Adding Poppler to path

  • Add Poppler installed to loaction :C:\Users\UserName\Downloads\Release-21.11.0-0.zip
  • Add C:\Users\UserName\Downloads\Release-21.11.0-0.zip to system variable path in Environment Variable

Specifying poppler path in code

pages = convert_from_path(filepath, poppler_path=r"actualpoppler_path")
dataninsight
  • 1,069
  • 6
  • 13
6

For windows; to solve PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH? :

Antony
  • 161
  • 3
  • 4
  • 1
    In order to install Choco run the following command as Powershell Admin `Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))` – Muneeb Ahmad Khurram Sep 07 '21 at 20:51
3

In Windows

Install the Poppler for Windows Poppler

  • 500 = Quality of JPG

  • the path contains the pdf files

  • pip install pdf2img

     path = r'C:\ABC\FEF\KLH\pdf_extractor\output\break'
    
     def spliting_pdf2img( path):
         from pdf2image import convert_from_path, convert_from_bytes
         for file in os.listdir(path):
             if file.lower().endswith(".pdf"):
                 pages = convert_from_path(os.path.join(path,file), 500,poppler_path= r'C:\ABC\DEF\Downloads\poppler-0.68.0\bin')
                 for page in pages:                    
                     page.save(os.path.join(path,file.lower().replace(".pdf",".jpg")),'JPEG')    
    

In Linux/UBUNTU Install the below packages in the ubuntu/linux terminal

  • sudo apt-get update

  • sudo apt-get install poppler-utils

    path = r'C:\ABC\FEF\KLH\pdf_extractor\output\break'
    
     def spliting_pdf2img( path):
         from pdf2image import convert_from_path, convert_from_bytes
         for file in os.listdir(path):
             if file.lower().endswith(".pdf"):
                 pages = convert_from_path(os.path.join(path,file), 500)
                 for page in pages:                    
                     page.save(os.path.join(path,file.lower().replace(".pdf",".jpg")),'JPEG')
    
thrinadhn
  • 1,673
  • 22
  • 32
3

If anyone still has this error on Windows, I solved the problem by:

  • Download the Latest binary of Poppler for Windows from Poppler for Windows
  • Unzip it into C drive like C:\poppler-0.68.0
  • Specify the Poppler path like this:
from PIL import Image
import pytesseract
import sys
from pdf2image import convert_from_path
import os

ROOT_DIR = os.path.abspath(os.curdir)

# Path of the pdf 
PDF_file = ROOT_DIR + r"\PdfToImage\src\2.pdf"
  
''' 
Part #1 : Converting PDF to images 
'''
  
# Store all the pages of the PDF in a variable 
pages = convert_from_path(PDF_file, 500, poppler_path=r'C:\poppler-0.68.0\bin')
Malki Mohamed
  • 1,578
  • 2
  • 23
  • 40
2

FOR MAC, if you have brew installed, that is the way to go.

brew install poppler

Takes several minutes to install all the dependencies, but pdf2image will work afterwards.

This is a repeat of an answer here and the answer is also in a comment on this page. Adding this answer b/c it took me a while to find the correct solution FOR MACs.

MattC
  • 5,874
  • 1
  • 47
  • 40
  • for mac M1: Error: Cannot install in Homebrew on ARM processor in Intel default prefix (/usr/local)! – territorial Jan 19 '23 at 20:31
  • Yeah, I seem to remember having some M1 issues with Homebrew. Pretty common and several ways to get around. Some solutions are listed here https://stackoverflow.com/questions/64963370/error-cannot-install-in-homebrew-on-arm-processor-in-intel-default-prefix-usr. – MattC Feb 20 '23 at 20:40
0

I'm working on a mac in Visual Studio Code and I encountered this error. I followed the install instructions and was able to verify the packages were installed but the error persisted when running in VSC.

Even though I had my python.condaPath and python.pythonPath specified in my settings.json it wasn't until activated the conda environment inside of the VSC integrated terminal itself

conda activate my_env

that the error went away..

Bizarre.

Danoram
  • 8,132
  • 12
  • 51
  • 71
0

After downloading poppler do this.... import os os.environ["PATH"] = r"C:.....\poppler-xxxxxxx\bin" use this to make environment hope it works.It worked for me.

0

I had the same problem on my Mac
I solved it by replacing the poppler_path from - poppler_path= '\usr\bin' " to poppler_path= '\usr\local\bin' but you can try to print all the places that poppler might be in your mac by echo $PATH in the Terminal and try all the options as poppler_path=" "

  • This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/30126219) – cse Oct 20 '21 at 07:53
0

I had the same issue on Mac using Visual Studio Code and a conda environment.

I found out that I could run the code from the command line, however not from VS code. I then printed the environment variables when running from the command line and in VS code using:

print(os.environ)

When I compared the two, I noticed that the "PATH" variable was different. My conda environment was not in the "PATH" variable in VS code. I think this means that VS code was not correctly activating my conda environment. I therefore took my "PATH" from the command line and set it in my launch.json environment variables. Then the problem was fixed.

"configurations": [
        {
            "name": "Python: Current File",
            "type": "python",
            "request": "launch",
            "python": "/Users/<username>/miniconda3/envs/<env_name>/bin/python",
            "env": {
                "PATH":"<PATH STRING from command line>"
            },
            "program": "${file}"
        }
phil
  • 225
  • 1
  • 4
  • 13
0

I kind of followed the steps from one of the previous posted answers except I had to add the path in env variables. Adding path in pdf2image.convert_from_path didn't worked for me. So, if anyone still has this error on Windows, I solved the problem by:

  1. Download the Latest binary of Poppler for Windows from Poppler Windows

  2. Unzip it into C drive like C:\poppler-0.68.0

  3. Specify the Poppler path in environment variables

Poppler path in env variables

Ann
  • 11
  • 3
-3

I had same issue but I have fixed it in my django project by changing directory. Actually first you need to store this pdf image file in side your media directory. Then you need to change your current directory to this media directory(where this pdf image file has been stored). This is my code snippet in django project where I have converted .pdf image to .jpg

import PIL
from PIL import Image

def convert_pdf_2_image(uploaded_image_path, uploaded_image,img_size):
    project_dir = os.getcwd()
    os.chdir(uploaded_image_path)
    file_name = str(uploaded_image).replace('.pdf','')
    output_file = file_name+'.jpg'
    pages = convert_from_path(uploaded_image, 200)
    for page in pages:
        page.save(output_file, 'JPEG')
        break
    os.chdir(project_dir)
    img = Image.open(output_file)
    img = img.resize(img_size, PIL.Image.ANTIALIAS)
    img.save(output_file)
    return output_file
Abhay
  • 5
  • 7
  • 1
    Your code is missing imports and still results in the poppler error message if the original reason for this error is not resolved. – Ryan Harris Jun 04 '20 at 18:10
  • Yup, `convert_from_path` is from `pdf2image` which requires GPL-licensed `poppler`. @abhay, I'd delete this answer – mirekphd May 20 '23 at 11:56