I have a pdf file that I am reading using pymupdf using the below syntax.
import fitz # this is pymupdf
with fitz.open('file.pdf') as doc:
text = ""
for page in doc:
text += page.getText()
Is there a way to ignore the header and footer while reading it?
I tried converting pdf to docx as it is easier to remove headers, but the pdf file I am working on is getting reformatted when I convert it to docx.
Is there any way pymupdf does this during the read?