The code extracts the page number that is mentioned below every page, but I need the actual page number which is the file page number, not the document page number. I have also attached the screenshot and marked the page number in red that needs to be extracted. Please look into it.
Here is the code I have tried.
import PyPDF2
import re
obj = PyPDF2.PdfFileReader(r"avnet_202209 (1).pdf")
pgno = obj.getNumPages()
S = "Basis of presentation and new accounting pronouncements"
for i in range(0, pgno):
PgOb = obj.getPage(i)
Text = PgOb.extractText()
if re.search(S,Text):
print("String Found on Page: " + str(i))
The output was : String Found on Page: 7 String Found on Page: 22
Required output: String Found on Page: 8 String Found on Page: 23