2

Is there any method to obtain the page number of a particular section in a pdf using pdfminer or any other package suitable for python.I need to obtain the page number of the index section of a pdf.

  • Have you tried http://stackoverflow.com/questions/12605170/extract-text-per-page-with-python-pdfminer? – Preston Martin Oct 05 '16 at 14:13
  • 1
    I don't have a problem extracting the text but I want to know the page number of a particular section of the pdf that is obtained by using the document.get_outlines() function – Code-crazy9 Oct 05 '16 at 16:13

1 Answers1

3

I know this is an old post, but I have been having the same issue. The only solution that has produced some promising results is:

from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfpage import PDFPage
from pdfminer.pdfparser import PDFParser

def pdf_pages(file):
        parser = PDFParser(open(file, "rb"))
        document = PDFDocument(parser)
        for pages, pdfObjects in enumerate(PDFPage.create_pages(document)):
            print(pages+1, pdfObjects)

Hope this helps.

Thanks

Infinite_Loop
  • 380
  • 1
  • 7
  • 18