5

I am using grep to match lines which have exactly 52 pipelines (|). The grep command I am using is:

grep -nP "^(.*?\|){52}"

-P because the lazy modifier ? does not work otherwise. When I run this, the following message is displayed: PCRE's backtracking limit is exceeded. I guess there is something wrong with perl-like regex here.

  • Am I running out of memory?
  • Is the problem in the regex I am using?
  • Is there any better regex I can use?

Thanks a lot!

lsmor
  • 4,698
  • 17
  • 38

1 Answers1

2

Your PCRE pattern (that only matches 52 occurrences of any 0+ chars, as few as possible, up to and including | char, and does not check any text beyond that) contains a repeated capturing group, when the engine matches, it also places each 0+ chars before each | and the | char into a group, and then re-writes the value upon each iteration. In some implementations, it causes the error you provided.

Note you do not need a PCRE regex for the task since to match any char but | you may use [^|] and then use a mere POSIX ERE pattern (enabled with -E option) with grep:

grep -En "^([^|]*\|){52}[^|]*$"

Note the [^|]*$ added at the end. It matches any 0+ chars other than | and then assert the end of line position. So, only lines containing 53 |-separated fields are matched.

Else, you might consider an awk solution (as PS suggests):

awk -F'|' '{if (NF==53) {print NR ":" $0;}}'

where we check for 53 |-separated fields and print the line number, : and the line itself.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    Thanks Wiktor The grep -En "^([^|]*\|){52}[^|]*$" solution worked fine ;) – lsmor Jun 11 '18 at 10:11
  • Re: "In some implementations, it causes the error you provided.": where is to read the details? Why the limit is hard-coded (is it?) instead of "limited only by available memory"? – pmor Feb 15 '22 at 17:34
  • 1
    @pmor [Here](https://stackoverflow.com/a/27868983/3832970) you can get more relevant details (PHP also uses the PCRE regex engine, now, PCRE2). – Wiktor Stribiżew Feb 15 '22 at 18:42