1

pdfgrep works like grep except that it acts on pages instead of lines. How can I craft a regular expression with a newline character?

I want to look for a, followed by any number of characters except linebreaks, followed by b, but pdfgrep 'a[^\n]*b' doesn't work, whereas pdfgrep 'a.*b' returns results that span multiple lines. (I've examined the output with xxd to confirm that these newlines are indeed \x0A.)

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
JellicleCat
  • 28,480
  • 24
  • 109
  • 162

1 Answers1

0

By default, pdfgrep uses a POSIX compliant regex flavor where . matches any char including line break chars.

Fortunately, pdfgrep also supports PCRE regex flavor with the help of -P flag. In a PCRE regex flavor, . matches any char but line break chars.

Thus, you can use

pdfgrep -P 'a.*b'
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563