Questions tagged [pymupdf]

PyMuPDF is a Python binding for MuPDF – “a lightweight PDF and XPS viewer”. MuPDF can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats. NOTE: It is imported in Python as fitz.

PyMuPDF is a Python binding for – “a lightweight PDF and XPS viewer”.

can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats.

These are files with extensions .pdf, .xps, .oxps, .cbz, .fb2 or .epub (so you can develop e-book viewers in Python).

PyMuPDF provides access to many important functions of MuPDF from within a Python environment.

Note on the Name fitz:

The standard Python import statement for this library is import fitz. This has a historical reason.

257 questions
61
votes
10 answers

How do I resolve "No module named 'frontend'" error message?

I have installed PymuPDF/fitz because am trying to extract images from PDF files. However, upon running the code below, I am seeing No module named 'frontend'. doc = fitz.open(pdf_path) for i in range(len(doc)): for…
Waqar
  • 817
  • 2
  • 8
  • 18
11
votes
2 answers

Installing PyMuPDF on MacOS Big Sur

I wanted to import fitz in my code. To do that, I tried installing PyMuPDF using pip3 install PyMuPDF However, this installation fails and returns this error: fitz/fitz_wrap.c:2754:10: fatal error: 'fitz.h' file not found #include
Nirali
  • 253
  • 3
  • 9
9
votes
1 answer

PyMUPDF - How to convert PDF to image, using the original document settings for the image size and set to 300dpi?

I'm currently looking at using the python package PyMuPDF for a workflow that converts PDF's to images (In my case, .TIFF files). I am trying to mimic the behaviour of another program that I currently use for PDF -> Image conversion. In that…
adan11
  • 647
  • 1
  • 7
  • 24
9
votes
3 answers

Extract images from PDF in high resolution with Python

I have managed to extract images from several PDF pages with the below code, but the resolution is quite low. Is there a way to adjust that? import fitz pdffile = "C:\\Users\\me\\Desktop\\myfile.pdf" doc = fitz.open(pdffile) for page_index in…
Omega
  • 750
  • 1
  • 8
  • 24
8
votes
3 answers

PyMuPDF: AttributeError: module 'fitz' has no attribute 'open'

pip3 install PyMuPDF Collecting PyMuPDF Using cached PyMuPDF-1.18.17-cp37-cp37m-win_amd64.whl (5.4 MB) Installing collected packages: PyMuPDF Successfully installed PyMuPDF-1.18.17 import fitz doc = fitz.open("my_pdf.pdf") When I look for def open…
Utopion
  • 935
  • 1
  • 4
  • 15
6
votes
5 answers

Unable to install PyMuPDF on alpine docker image

I am trying to install pymupdf package on apline image but getting below error fitz/fitz_wrap.c:2739:10: fatal error: ft2build.h: No such file or directory 2739 | #include | ^~~~~~~~~~~~ compilation…
Nitin Goyal
  • 497
  • 4
  • 16
5
votes
2 answers

How can I fix the 'Error in PyMuPDF' when installing paddleocr with pip?

When doing pip install paddleocr, I am facing an error in building wheel for PyMuPDF. Building wheels for collected packages: PyMuPDF Building wheel for PyMuPDF (setup.py) ... error error: subprocess-exited-with-error × python setup.py…
Jinen Rathore
  • 65
  • 1
  • 6
5
votes
2 answers

Saving a pymupdf fitz object to s3 as a pdf

I am trying to crop a pdf and save it to s3 with same name using lambda. I am getting error on the data type being a fitz.fitz.page import os import json import boto3 from urllib.parse import unquote_plus import fitz, sys from io import…
megv
  • 1,421
  • 5
  • 24
  • 36
5
votes
2 answers

How to change the highlight color in pdf using fitz module in python

Hi I am trying to change the highlight color in a pdf but not able to do so. The default highlight color is yellow but i want to change it Following is my code: import fitz doc = fitz.open(r"path\input.pdf") page=doc[0] text="some…
Gavya Mehta
  • 211
  • 6
  • 20
5
votes
1 answer

Camelot PDF dimensions

I have searched stackoverflow extensively before posting this and have not been able to find anything on camelot page dimensions. There is this question, which suggests using table_region but that does not solve OP's problem or mine. I unfortunately…
Jinx
  • 511
  • 1
  • 3
  • 10
5
votes
4 answers

Issues with PyMuPDF extracting plain text

I want to read in a PDF file using PyMuPDF. All I need is plain text (no need to extract info on color, fonts, tables etc.). I have tried the following import fitz from fitz import TextPage ifile = "C:\\user\\docs\\aPDFfile.pdf" doc =…
PyRsquared
  • 6,970
  • 11
  • 50
  • 86
4
votes
1 answer

How to read pdf images as opencv images using PyMuPDF?

I would like to read all images found in a pdf file by PyMuPDF as opencv images, as close as they are from the source (avoiding funky format conversions that would lead to precision loss). Basically, I would like the result to be the exact same as…
Vincent
  • 57,703
  • 61
  • 205
  • 388
4
votes
1 answer

Efficiently extract the highlighted portion from PDFs using PyMuPDF python?

I have a use case where I have to highlight table from PDF document and then extract the highlighted part using python. Once it is highlighted, I have to transform the extracted part to a dataframe such that the dataframe should look like this: name…
technophile_3
  • 531
  • 6
  • 21
4
votes
3 answers

Print all objects inside a PDF file with Python

I'd like to list all objects present in a PDF file: text blocks, images, fonts, page objects, but also vector shapes (if any). I hoped to see all of them with PyMuPDF: import fitz # pip install PyMuPDF doc = fitz.open('test.pdf') for xref in…
Basj
  • 41,386
  • 99
  • 383
  • 673
4
votes
0 answers

extracting text using flags to focus on bold / italic font using PyMUPDF

I am trying to extract bold text elements from PDFs using PyMUPDF 1.18.14. I was hoping that this would work as I understand from the docs that flags=4 targets bold font. page = doc[1] text = page.get_text(flags=4) print(text) But it prints out all…
Cam
  • 1,263
  • 13
  • 22
1
2 3
17 18