0

I'd like to mark several keywords in a pdf document using Python and pymupdf.

The code looks as follows (source: original code):

import fitz

doc = fitz.open("test.pdf")

page = doc[0]

text = "result"

text_instances = page.searchFor(text)

for inst in text_instances:
    highlight = page.addHighlightAnnot(inst)
            highlight.setColors(colors='Red')
    highlight.update()


doc.save("output.pdf")

However, the text gets only marked on one page. I tried changing the code as described in the documentation for pymupdf (documentation) so it slices over all pages.

import fitz

doc = fitz.open("test.pdf")
for page in doc.pages(1, 3, 1):
    pass

text = "result"

text_instances = page.searchFor(text)

for inst in text_instances:
    highlight = page.addHighlightAnnot(inst)
    highlight.setColors(colors='Red')
    highlight.update()


doc.save("output.pdf")

Unfortunately, it still only marks the keywords on one page. What do I need to change, so the keywords get marked on all pages?

danik
  • 103
  • 2
  • Your indenting is wrong. At the moment, the save occurs during the loop. This means output.pdf will be overwritten by the last loop. – Alan Apr 15 '21 at 19:24
  • @Alan I changed the indentation for `pass` as this indentation was wrong. Did you mean this indentation or did I made another error with the indentation? – danik Apr 15 '21 at 19:43
  • doc.save("output.pdf") is part of the `inst` loop – Alan Apr 15 '21 at 19:49
  • @Alan You're right, I missed that one too when uploading the question. However, it still doesn't work properly. – danik Apr 15 '21 at 19:56

1 Answers1

0

There are 2 major issues you had with your code:

  1. Indentation
  2. The start of the slice is zero-based

Otherwise your understanding of the code seems fine.

for page in doc.pages(1, 3, 1):
    pass

If you want to loop over pages, you would need to put your highlight code inside the page loop. In addition, you are starting on page 2, not page 1 because page 1 is represented by index 0.

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import fitz

doc = fitz.open("test.pdf")

text = "result"

# page = doc[0]
# for page in doc.pages(start, stop, step):
for page in doc.pages(0, 3, 1):
    text_instances = page.searchFor(text)

    for inst in text_instances:
        highlight = page.addHighlightAnnot(inst) 
        highlight.setColors(colors='Red')
        highlight.update()
    
doc.save("output.pdf")
Alan
  • 2,914
  • 2
  • 14
  • 26