Obtaining the starting page number of a section using pdfminer.

Question

Is there any method to obtain the page number of a particular section in a pdf using pdfminer or any other package suitable for python.I need to obtain the page number of the index section of a pdf.

Have you tried http://stackoverflow.com/questions/12605170/extract-text-per-page-with-python-pdfminer? — Preston Martin, Oct 05 '16 at 14:13
I don't have a problem extracting the text but I want to know the page number of a particular section of the pdf that is obtained by using the document.get_outlines() function — Code-crazy9, Oct 05 '16 at 16:13

score 3 · Answer 1 · answered May 10 '18 at 14:42

I know this is an old post, but I have been having the same issue. The only solution that has produced some promising results is:

from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfpage import PDFPage
from pdfminer.pdfparser import PDFParser

def pdf_pages(file):
        parser = PDFParser(open(file, "rb"))
        document = PDFDocument(parser)
        for pages, pdfObjects in enumerate(PDFPage.create_pages(document)):
            print(pages+1, pdfObjects)

Hope this helps.

Thanks

Obtaining the starting page number of a section using pdfminer.

1 Answers1