Let's assume that I have a PDF file with 300 pages. What it actually has is 100 forms (always 3 pages per form). On the first page of the form, there's a text value that will determine to which output file it will go. This value starts with the letter "G" and 3 numerical values (i.e. "G100". "G201" etc.) And here it starts a problem for me. The forms are mixed up in the PDF. I will show what I mean:
1st page: G100
4th page: G201
7th page: G100
10th page: G256
...
298th page: G100
Based on that I should create an output: "G100.pdf" which will contain pages 1-3, 7-9, 298-300. And the same for each unique type of form. I don't know how many types there will be, how they will be named (aside from the described pattern), and how many page ranges will they have.
Is there any way to accomplish that using python? I've seen some ways to use PyPDF2 to split pages, but I don't know how to get this done in an efficient way in big PDF's with non-contiguous data.