pdfgrep pattern to include/exclude linebreak

Question

pdfgrep works like grep except that it acts on pages instead of lines. How can I craft a regular expression with a newline character?

I want to look for a, followed by any number of characters except linebreaks, followed by b, but pdfgrep 'a[^\n]*b' doesn't work, whereas pdfgrep 'a.*b' returns results that span multiple lines. (I've examined the output with xxd to confirm that these newlines are indeed \x0A.)

Try `pdfgrep -P 'a.*b'` – Wiktor Stribiżew Jul 08 '20 at 22:43 — Wiktor Stribiżew, Jul 08 '20 at 22:43
Thanks, @WiktorStribiżew ! – JellicleCat Jul 08 '20 at 22:48 — JellicleCat, Jul 08 '20 at 22:48

score 0 · Accepted Answer · answered Jul 08 '20 at 22:49

By default, pdfgrep uses a POSIX compliant regex flavor where . matches any char including line break chars.

Fortunately, pdfgrep also supports PCRE regex flavor with the help of -P flag. In a PCRE regex flavor, . matches any char but line break chars.

Thus, you can use

pdfgrep -P 'a.*b'

pdfgrep pattern to include/exclude linebreak

1 Answers1

Linked