3

I'm looking for a way to highlight words (e.g."some words [0-9]") or better the whole line with the given words in it, in some onesided PDFs. It will be part of a Batch-process on Windows, so i need a command line way to do this. I've looked at Ghostscript, but can not see how it is to be used.

hope i didnt made something wrong - i looked into other questions, mainly Add comments to PDF files automagically with regular expressions but this helped me not really, also english is not my native language - as you maybe have noticed already.

Thanks in advance

Community
  • 1
  • 1

1 Answers1

0

Ghostscript can't do this. Generalized text tools also can't because (1) most PDF's have the text commands in compressed blocks, and (2) text often is not 'encoded' in any standard way. Sometimes the font provides a ToUnicode map, but often not even that and (3) what looks like text may not even be text -- it may just be bitmapped images.

A tool like 'mutool clean -d' and "expand" a PDF so that (1) is solved -- text commands can be found in the PDF, but you still may have things like:

(!"##$) Tj

instead of Hello because of (2). And then there's the other way kerned text is done in PDF, even if standard encoding is used:

[(H) 120 (e) 80 (l) 95 (l) 95 (o)] TJ

It might be possible, but very difficult, and would require programming, and still would not address (3) (that would require OCR of the bitmapped text).

Ray Johnston
  • 613
  • 4
  • 3