I have an assortment of searchable PDF files and I often search particular patterns in all of them simultaneously, using the pdfgrep
command. My regex knowledge is somewhat limited and I'm not sure how to work around linebreaks and page layout.
For example, I would like to find the pattern "ignor.{0,10}layout"
in each example below:
This is a rather difficult You see, I would like to ignore
task that I am trying to page layout and still find the
achieve. pattern I am looking for.
This is a rather difficult This is because I would like to ig-
task that I am trying to nore page layout and still find the
achieve. pattern I am looking for.
In both examples, I would like the first two lines to be reported by
pdfgrep -n "ignor.{0,10}layout" *
but it fails to do so because:
- there is a linebreak in the middle.
- in the first example, there are more than 10 characters between
ignor
andlayout
. - in the second example,
ignor
is cut in half.
Is there a regex that would solve this problem entirely?