Questions tagged [pymupdf]

PyMuPDF is a Python binding for MuPDF – “a lightweight PDF and XPS viewer”. MuPDF can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats. NOTE: It is imported in Python as fitz.

PyMuPDF is a Python binding for mupdf – “a lightweight PDF and XPS viewer”.

mupdf can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats.

These are files with extensions .pdf, .xps, .oxps, .cbz, .fb2 or .epub (so you can develop e-book viewers in Python).

PyMuPDF provides access to many important functions of MuPDF from within a Python environment.

Note on the Name fitz:

The standard Python import statement for this library is import fitz. This has a historical reason.

257 questions

votes

10 answers

How do I resolve "No module named 'frontend'" error message?

I have installed PymuPDF/fitz because am trying to extract images from PDF files. However, upon running the code below, I am seeing No module named 'frontend'. doc = fitz.open(pdf_path) for i in range(len(doc)): for…

python python-3.x mupdf pymupdf

asked Jun 05 '19 at 20:37

Waqar

votes

2 answers

Installing PyMuPDF on MacOS Big Sur

I wanted to import fitz in my code. To do that, I tried installing PyMuPDF using pip3 install PyMuPDF However, this installation fails and returns this error: fitz/fitz_wrap.c:2754:10: fatal error: 'fitz.h' file not found #include …

python-3.x clang apple-m1 macos-big-sur pymupdf

asked Jul 28 '21 at 18:02

Nirali

votes

1 answer

PyMUPDF - How to convert PDF to image, using the original document settings for the image size and set to 300dpi?

I'm currently looking at using the python package PyMuPDF for a workflow that converts PDF's to images (In my case, .TIFF files). I am trying to mimic the behaviour of another program that I currently use for PDF -> Image conversion. In that…

pymupdf

asked Oct 02 '21 at 08:13

adan11

votes

3 answers

Extract images from PDF in high resolution with Python

I have managed to extract images from several PDF pages with the below code, but the resolution is quite low. Is there a way to adjust that? import fitz pdffile = "C:\\Users\\me\\Desktop\\myfile.pdf" doc = fitz.open(pdffile) for page_index in…

python pdf pymupdf

asked Sep 10 '20 at 00:20

Omega

votes

3 answers

PyMuPDF: AttributeError: module 'fitz' has no attribute 'open'

pip3 install PyMuPDF Collecting PyMuPDF Using cached PyMuPDF-1.18.17-cp37-cp37m-win_amd64.whl (5.4 MB) Installing collected packages: PyMuPDF Successfully installed PyMuPDF-1.18.17 import fitz doc = fitz.open("my_pdf.pdf") When I look for def open…

python pymupdf

asked Sep 13 '21 at 09:14

Utopion

votes

5 answers

Unable to install PyMuPDF on alpine docker image

I am trying to install pymupdf package on apline image but getting below error fitz/fitz_wrap.c:2739:10: fatal error: ft2build.h: No such file or directory 2739 | #include | ^~~~~~~~~~~~ compilation…

python docker alpine-linux pymupdf

asked Dec 21 '20 at 06:34

Nitin Goyal

votes

2 answers

How can I fix the 'Error in PyMuPDF' when installing paddleocr with pip?

When doing pip install paddleocr, I am facing an error in building wheel for PyMuPDF. Building wheels for collected packages: PyMuPDF Building wheel for PyMuPDF (setup.py) ... error error: subprocess-exited-with-error × python setup.py…

pip pymupdf paddleocr

asked Jun 01 '23 at 06:43

Jinen Rathore

votes

2 answers

Saving a pymupdf fitz object to s3 as a pdf

I am trying to crop a pdf and save it to s3 with same name using lambda. I am getting error on the data type being a fitz.fitz.page import os import json import boto3 from urllib.parse import unquote_plus import fitz, sys from io import…

python pdf amazon-s3 aws-lambda pymupdf

asked Jan 31 '22 at 14:19

megv

1,421
5
24
36

votes

2 answers

How to change the highlight color in pdf using fitz module in python

Hi I am trying to change the highlight color in a pdf but not able to do so. The default highlight color is yellow but i want to change it Following is my code: import fitz doc = fitz.open(r"path\input.pdf") page=doc[0] text="some…

python pymupdf

asked Mar 06 '20 at 05:16

Gavya Mehta

votes

1 answer

Camelot PDF dimensions

I have searched stackoverflow extensively before posting this and have not been able to find anything on camelot page dimensions. There is this question, which suggests using table_region but that does not solve OP's problem or mine. I unfortunately…

python python-camelot pymupdf

asked Dec 03 '19 at 19:19

Jinx

votes

4 answers

Issues with PyMuPDF extracting plain text

I want to read in a PDF file using PyMuPDF. All I need is plain text (no need to extract info on color, fonts, tables etc.). I have tried the following import fitz from fitz import TextPage ifile = "C:\\user\\docs\\aPDFfile.pdf" doc =…

python pdf pymupdf

asked Jun 04 '18 at 14:05

PyRsquared

6,970
11
50
86

votes

1 answer

How to read pdf images as opencv images using PyMuPDF?

I would like to read all images found in a pdf file by PyMuPDF as opencv images, as close as they are from the source (avoiding funky format conversions that would lead to precision loss). Basically, I would like the result to be the exact same as…

python-3.x image numpy opencv pymupdf

asked Jul 03 '22 at 16:51

Vincent

57,703
61
205
388

votes

1 answer

Efficiently extract the highlighted portion from PDFs using PyMuPDF python?

I have a use case where I have to highlight table from PDF document and then extract the highlighted part using python. Once it is highlighted, I have to transform the extracted part to a dataframe such that the dataframe should look like this: name…

python pandas text-extraction pymupdf

asked Dec 07 '21 at 07:01

technophile_3

votes

3 answers

Print all objects inside a PDF file with Python

I'd like to list all objects present in a PDF file: text blocks, images, fonts, page objects, but also vector shapes (if any). I hoped to see all of them with PyMuPDF: import fitz # pip install PyMuPDF doc = fitz.open('test.pdf') for xref in…

python pdf data-mining pymupdf

asked Nov 15 '21 at 11:20

Basj

41,386
99
383
673

votes

0 answers

extracting text using flags to focus on bold / italic font using PyMUPDF

I am trying to extract bold text elements from PDFs using PyMUPDF 1.18.14. I was hoping that this would work as I understand from the docs that flags=4 targets bold font. page = doc[1] text = page.get_text(flags=4) print(text) But it prints out all…

python python-3.x search bold pymupdf

asked Jul 14 '21 at 17:42

Cam

1,263
13
22

2 3

…

17 18 Next