How to turn a large PDF's pages to images efficiently?

Asked May 22 '22 at 22:27

Active May 22 '22 at 22:27

Viewed 40 times

I have a FastAPI endpoint to receive a PDF with many pages.

The client takes photos of several documents, turns them into one PDF, and sends them to this endpoint.

This endpoint should convert every page back to an image and do some processing on it, this should obviously be done in the background.

What is the most efficient way to do this? I have tried pdf2image, but as their documentation says,

A relatively big PDF will use up all your memory and cause the process to be killed

asked May 22 '22 at 22:27

Adham Salama

2

Split the PDF into individual pages and use `pdf2image` on them one at a time. – MattDMo May 22 '22 at 22:41
That documentation was written a generation ago, when machines had memory in the megabytes. You can safely use `pdf2image` for this. – Tim Roberts May 22 '22 at 22:45
@TimRoberts I tried it just now and it used over 5 Gigabytes of RAM and kept running for over 100 seconds, I had to kill the process. – Adham Salama May 22 '22 at 22:49
@MattDMo pdf2image works on an entire file, how can I use it with individual pages? – Adham Salama May 22 '22 at 22:52
@KJ They are not images of text. – Adham Salama May 23 '22 at 00:03
I don't know why my question was closed. The question from 13 years ago doesn't even answer my question, but still, my question has other parts to it, like how to do this in the background using FastAPI. I definitely don't think this question should have been closed. – Adham Salama May 23 '22 at 13:28

0 Answers0