How to detect drawings and get their size from a pdf using python?

Question

Basically I want to detect and get the bounding box of the figures or drawings which are in pdf using python, enter image description here As per the image I just want the bounding box of the figure right below the question, but it also detects the "-" present in the options.

So I tried using the fitz_page.get_drawings() method from fitz library and it helps in replicating the images on another pdf file, but doesn't help in differentiating when the first figure ends and the second starts. I just want the x-y coordinates of the bounding box for each figure separately.

https://stackoverflow.com/questions/22898145/how-to-extract-text-and-text-coordinates-from-a-pdf-file — Сергей Кох, Feb 22 '23 at 09:27

score 0 · Answer 1 · answered Feb 23 '23 at 13:36

0

I don't think PyMuPDF currently returns information about boundaries of individual figures, though we have some internal code that might allow this in the next release.

answered Feb 23 '23 at 13:36

Julian Smith

196
1
2

Apologies, this comment rather overstated what our internal code can do - it's just a script that looks for overlapping paths. For more information, please see: https://github.com/pymupdf/PyMuPDF/issues/2247 – Julian Smith Mar 16 '23 at 10:53

How to detect drawings and get their size from a pdf using python?

1 Answers1