I have pdf
documents.
And it's clear to me how to extract text from it.
I need to extract not only text but also coordinates associated with this text.
It's my code:
from PyPDF2 import PdfReader
pdf_path = 'docs/doc_3.pdf'
pdf = PdfReader(pdf_path)
page_1_object = pdf.getPage(1)
page_1_object.extractText().split("\n")
The result is:
['Creating value for all stakeholders',
'Anglo\xa0American is re-imagining mining to improve people’s lives.']
I need geometries associated with extracted paragraphs. Might be something like this for example:
['Creating value for all stakeholders', [1,2,3,4,]]
'Anglo\xa0American is re-imagining mining to improve people’s lives.', [7,8,9,10]]
How I can accomplish it?
Thanks,