1

I am using the following code to save pdf pages as images but its not storing as >JPEG but it is storing as >PPM file. How do I solve it?

from pdf2image import convert_from_path
pages = convert_from_path(path_to_pdf, output_folder=path_to_output, poppler_path=poppler_path)
for i in range(len(pages)):
    print(type(pages[i]))
    pages[i].save('page' + str(i) + '.jpg', 'JPEG')
ClawX69
  • 73
  • 1
  • 7

1 Answers1

0

Here is what is happening... With this line:

pages = convert_from_path(path_to_pdf, output_folder=path_to_output, poppler_path=poppler_path)

You are actually doing 2 things:

  1. writing .ppm files to the output folders, and
  2. loading the pages, which are PIL.PpmImagePlugin.PpmImageFile objects.

The actual saving of the object to a JPEG is made after, with

pages[i].save('page' + str(i) + '.jpg', 'JPEG')

This means that to obtain the result you want to obtain, you just have to avoid providing the output_folder in the convert_from_path function and provide it while saving instead, as such:

import os
from pdf2image import convert_from_path

pages = convert_from_path(path_to_pdf, poppler_path=poppler_path)
for i in range(len(pages)):
    print(type(pages[i]))
    pages[i].save(os.path.join(path_to_output, 'page' + str(i) + '.jpg'), format='JPEG')
artygo
  • 121
  • 5