2

Here is my regex:

(?<!PAYROLL)(FIDELITY(?!.*TITLE)(?!.*NATION)|INVEST)(?!.*PAYROLL)

Here is my text

INCOMING WIRE TRUST GS INVESTMENT 
VANGUARD PAYROLL
PAYROLL FIDELITY
ACH CREDIT FIDELITY INVESTM-FIDELITY
ACH CREDIT FIDELITY INVESTM-FIDELITY
ACH DEBIT FIDELITY 
ACH DEBIT FIDELITY 
ACH CREDIT FIDELITY INVESTM-FIDELITY

When running this on http://regexr.com (using the PCRE RegEx Engine), it is matching on "PAYROLL FIDELITY", yet I'm specifying a negative lookbehind to not do that(?<!PAYROLL).

Any help appreciated.

mikelowry
  • 1,307
  • 4
  • 21
  • 43
  • 2
    But there is a whitespace, try with `(?<!PAYROLL\s)`, see https://regex101.com/r/MclkGz/1 – Wiktor Stribiżew Dec 31 '20 at 19:40
  • Use [regex101](https://regex101.com/) instead - regex seems to work there – craymichael Dec 31 '20 at 19:40
  • That worked, @WiktorStribiżew, how come I cant use ```.*``` instead of ```\s```? – mikelowry Dec 31 '20 at 19:42
  • It is not possible to use infinite width patterns inside lookbehinds in PCRE regex patterns. You may work around it with `(*SKIP)(*F)`: `\bPAYROLL.*?FIDELITY(*SKIP)(*F)|(FIDELITY(?!.*TITLE)(?!.*NATION)|INVEST)(?!.*PAYROLL)`. See https://regex101.com/r/MclkGz/2 – Wiktor Stribiżew Dec 31 '20 at 19:42
  • ah ```*``` A quantifier inside a lookbehind makes it non-fixed width, i'll see if i can find a work around – mikelowry Dec 31 '20 at 19:43

1 Answers1

1

The (?<!PAYROLL) negative lookbehind matches a location that is not immediately preceded with PAYROLL char sequence. In the PAYROLL FIDELITY string, the FIDELITY is not immediately preceded with PAYROLL, it is immediately preceded with PAYROLL + space.

You can solve the current problem in various ways. If you are sure there is always a single whitespace between words in the string (say, it is a tokenized string) add \s after PAYROLL: (?<!PAYROLL\s).

If there can be one or more whitespaces, the (?<!PAYROLL\s+) pattern won't work in PCRE as PCRE lookbehind patterns must be of fixed width. You might match (some) exceptions and skip them using (*SKIP)(*FAIL) PCRE verbs:

PAYROLL\s+FIDELITY(*SKIP)(*F)|(FIDELITY(?!.*TITLE)(?!.*NATION)|INVEST)(?!.*PAYROLL)

See the regex demo. You may even replace PAYROLL\s+FIDELITY(*SKIP)(*F) with PAYROLL.*?FIDELITY(*SKIP)(*F) or PAYROLL[\s\S]+?FIDELITY(*SKIP)(*F) to skip any text chunk from PAYROLL till the leftmost FIDELITY. PAYROLL\s+FIDELITY(*SKIP)(*F) matches PAYROLL, one or more whitespaces, FIDELITY and then fails the match triggering backtracking, and then the match is skipped and the next match is searched for starting from the index where the failure occurred.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563