0

I have very little experience in manipulating pdfs using python, and my experience is restricted only to reading using 'pdfreader' a python library. I have a pdf, (which in this case is a past exam paper), I want it to split a page when it encounters a question number, let's say 12 for this example (it would be formatted "12."), and save the split part containing the number 12. in a new pdf. How do I do this?

I'm not a very good programmer so sorry if my question is stupid, but searching on the internet I could not find how to do this.

  • Actually it is much easier than this, I already wrote a 30 line python code to find the question and the page in which a specific word is in. My problem now is splitting the page in a specific point and saving the split part into another pdf. – Riccardo Piana Feb 25 '22 at 18:59
  • @KJ thx, do you know of a library with which I can do this? – Riccardo Piana Feb 27 '22 at 12:20
  • @KJ, thx for the help, i found this post https://stackoverflow.com/questions/22898145/how-to-extract-text-and-text-coordinates-from-a-pdf-file that together with your comments can help me find a solution, also, what do you mean with "I don't have that many days left?" – Riccardo Piana Feb 27 '22 at 12:43

1 Answers1

0

The solution at the end was to transform the pdf page into an image, crop it where I want it, then back to a pdf. To get the coordinates I had to use pdf miner, to then get the pixels to modify the image I had to make a proportion between the height of the page in pdf coordinates and the height of the image I wanted to create in pixels, so then I could transform the coordinates of one into the coordinates of the other.