How do I get a specific part of a page in a pdf and save it to a new pdf in python?

Question

I have very little experience in manipulating pdfs using python, and my experience is restricted only to reading using 'pdfreader' a python library. I have a pdf, (which in this case is a past exam paper), I want it to split a page when it encounters a question number, let's say 12 for this example (it would be formatted "12."), and save the split part containing the number 12. in a new pdf. How do I do this?

I'm not a very good programmer so sorry if my question is stupid, but searching on the internet I could not find how to do this.

Actually it is much easier than this, I already wrote a 30 line python code to find the question and the page in which a specific word is in. My problem now is splitting the page in a specific point and saving the split part into another pdf. — Riccardo Piana, Feb 25 '22 at 18:59
@KJ, thx for the help, i found this post https://stackoverflow.com/questions/22898145/how-to-extract-text-and-text-coordinates-from-a-pdf-file that together with your comments can help me find a solution, also, what do you mean with "I don't have that many days left?" — Riccardo Piana, Feb 27 '22 at 12:43

score 0 · Accepted Answer · answered Mar 10 '22 at 12:13

The solution at the end was to transform the pdf page into an image, crop it where I want it, then back to a pdf. To get the coordinates I had to use pdf miner, to then get the pixels to modify the image I had to make a proportion between the height of the page in pdf coordinates and the height of the image I wanted to create in pixels, so then I could transform the coordinates of one into the coordinates of the other.

How do I get a specific part of a page in a pdf and save it to a new pdf in python?

1 Answers1