I am new to python and have been working on a project to make a new pdf with highlighted text. I am using pymupdf to get the text and am storing the text, font size, and the index of the text.
I found a way to highlight the text but it searches and highlights all occurrences of the string (text).
import fitz
### READ IN PDF
doc = fitz.open("input.pdf")
page = doc[0]
### SEARCH
text = "Sample text"
text_instances = page.searchFor(text)
### HIGHLIGHT
for inst in text_instances:
highlight = page.addHighlightAnnot(inst)
### OUTPUT
doc.save("output.pdf", garbage=4, deflate=True, clean=True)
I need a way to highlight any specific line/word (not all) Or maybe how to store the rect coordinates of each line.
One example of the usage would be if there is a heading called Summary and in the text in this heading there are occurances of "summary" I want to highlight only the heading (or the text in paragraph).