12

I want to convert a pdf (one page) into a png file. I installed pdf2image and get this error: popler is not installed in windows.

According to this question: Poppler in path for pdf2image, poppler should be installed and PATH modified.

I cannot do any of those (I don't have the necessary permissions in the system I am working with).

I had a look at opencv and PIL and none seems to offer the possibility to make this transformation: PIL (see here https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html?highlight=pdf#pdf) does not offer the possibility to read pdfs, only to save images as pdfs. The same goes for openCV.

Any suggestion how to make the pdf to png transformation ? I can install any python library but I can not touch the windows installation.

thanks

Seon
  • 3,332
  • 8
  • 27
JFerro
  • 3,203
  • 7
  • 35
  • 88
  • 1
    I HAVE to do it in python because I can only connect to the APIs from a Jupyter Hub environment, and it has to be done on the fly. – JFerro Oct 20 '21 at 15:05
  • Lucky you, thank the admins for protecting your code from infection by `poppler`'s "viral" copyleft (GPL) [license](https://gitlab.freedesktop.org/poppler/poppler/-/blob/master/COPYING) – mirekphd May 20 '23 at 11:37

3 Answers3

11

PyMuPDF supports pdf to image rasterization without requiring any external dependencies.

Sample code to do a basic pdf to png transformation:

import fitz  # PyMuPDF, imported as fitz for backward compatibility reasons
file_path = "my_file.pdf"
doc = fitz.open(file_path)  # open document
for i, page in enumerate(doc):
    pix = page.get_pixmap()  # render page to an image
    pix.save(f"page_{i}.png")
Andrey
  • 23
  • 6
Seon
  • 3,332
  • 8
  • 27
  • Hi @Seon but you are importing a my_file.png, I understand that it could be a pdf right? – JFerro Oct 20 '21 at 15:16
  • That was indeed a typo, fixed it! – Seon Oct 20 '21 at 16:09
  • How can you just convert first 10 pages ? – Hans Peter Dec 07 '21 at 07:24
  • 1
    `doc` is indexable, so you can just use a for loop: `for i in range(10)`, and set `page=doc[i]`. – Seon Dec 07 '21 at 17:36
  • 1
    Thanks for your competent comments, @Seon - just an addition: the new PyMuPDF version 1.22.0 also supports saving to JPEG directly, without having to use Pillow: `pix.save("file.jpg", jpg_quality=n)`. As can be seen, the JPEG quality can be chosen with an additional parameter. – Jorj McKie Apr 17 '23 at 11:03
  • Note it is [licensed](https://github.com/pymupdf/PyMuPDF/blob/main/COPYING) under AGPL, which still requires you to disclose source, like GPL-licensed `poppler` called by `pdf2image` (and network use is deemed to be distribution). – mirekphd May 20 '23 at 11:59
9

Here is a snippet that generates PNG images of arbitrary resolution (dpi):

import fitz
file_path = "my_file.pdf"
dpi = 300  # choose desired dpi here
zoom = dpi / 72  # zoom factor, standard: 72 dpi
magnify = fitz.Matrix(zoom, zoom)  # magnifies in x, resp. y direction
doc = fitz.open(fname)  # open document
for page in doc:
    pix = page.get_pixmap(matrix=magnify)  # render page to an image
    pix.save(f"page-{page.number}.png")

Generates PNG files name page-0.png, page-1.png, ... By choosing dpi < 72 thumbnail page images would be created.

Jorj McKie
  • 2,062
  • 1
  • 13
  • 17
  • 2
    second row should be fname =, not file_path = – Chadee Fouad Dec 02 '22 at 02:58
  • From their rtd (https://pymupdf.readthedocs.io/en/latest/recipes-images.html): "Since version 1.19.2 there is a more direct way to set the resolution: Parameter "dpi" (dots per inch) can be used in place of "matrix". To create a 300 dpi image of a page specify pix = page.get_pixmap(dpi=300). Apart from notation brevity, this approach has the additional advantage that the dpi value is saved with the image file – which does not happen automatically when using the Matrix notation." – Joschua Apr 15 '23 at 23:34
  • Note the `fitz` Github repo has been archived by the owner on Aug 3, 2022. It is now read-only. The only version on PyPI is a 5-year-old version tagged "pre-release":) – mirekphd May 20 '23 at 12:02
0
import fitz

input_pdf = r"Samples\104295.pdf"

output_jpg = r"Output\104295.jpg"

#The code splits the first page of pdf and converts to jpeg
def split_and_convert(pdf_path, output_path):
    doc = fitz.open(pdf_path)
    page = doc.load_page(0)
    pix = page.get_pixmap()
    pix.save(output_path, "jpeg")
    doc.close()

split_and_convert(input_pdf, output_jpg)
  • Please add details explaining what your answer does and how it solves the problem, in addition to your code. – coradek Jun 21 '23 at 22:38