Is there any method to obtain the page number of a particular section in a pdf using pdfminer or any other package suitable for python.I need to obtain the page number of the index section of a pdf.
Asked
Active
Viewed 1,380 times
2
-
Have you tried http://stackoverflow.com/questions/12605170/extract-text-per-page-with-python-pdfminer? – Preston Martin Oct 05 '16 at 14:13
-
1I don't have a problem extracting the text but I want to know the page number of a particular section of the pdf that is obtained by using the document.get_outlines() function – Code-crazy9 Oct 05 '16 at 16:13
1 Answers
3
I know this is an old post, but I have been having the same issue. The only solution that has produced some promising results is:
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfpage import PDFPage
from pdfminer.pdfparser import PDFParser
def pdf_pages(file):
parser = PDFParser(open(file, "rb"))
document = PDFDocument(parser)
for pages, pdfObjects in enumerate(PDFPage.create_pages(document)):
print(pages+1, pdfObjects)
Hope this helps.
Thanks

Infinite_Loop
- 380
- 1
- 7
- 18