0

I'm trying to scan all attributes from a database, searching for specific patterns and ignoring similar ones that I know should not match but I'm having some problems as in the below example:

Let's say I'm trying to find Customer Registration Numbers and one of my patterns is this: .*CRN.* Then I'm ignoring everything that are not CRNs (like currency and country name) like this: (CRN)(?!CY|AME) So far everything is working fine as look ahead is included in Javascript

The next step is to exclude things like SCRN (screen) for example but look behind (?<!S)(CRN)(?!CY|AME) doesn't work.

Is there any alternative?

Example inputs: CREDIT_CARD DISCARD CARDINALITY CARDNO

My Regex (?!.*DISCARD.*|.*CARDINALITY.*).*CARD.*

CARDINALITY was removed but DISCARD still being considered :(

lgvicente
  • 3
  • 2
  • 1
    Also, if you are using a SQL database, please tell us which _version_ you are using (e.g. MySQL, SQL Server, Oracle, Postgres, etc.). – Tim Biegeleisen Sep 20 '19 at 01:36
  • Not using it on a database. It is a javascript that iterates a data dictionary in a flat file. – lgvicente Sep 20 '19 at 03:47
  • 1
    I really appreciate the help but none of these patterns are removing the match with the word "DISCARD" from the example. The other post is referring to the negative look-ahead which I'm using to exclude strings after CARD like CARDINALITY but not before like DISCARD. So the problem is still there :( – lgvicente Sep 20 '19 at 05:46
  • You just miss a `^` at the start. – Wiktor Stribiżew Sep 22 '19 at 16:51

1 Answers1

1

The regex that you want is:

(?!\b(?:CARDINALITY|DISCARD)\b)(\b\w*CARD\w*\b)

It is important that you are testing the negative lookahead against the entire word and thus we are trying to match (\b\w*CARD\w*\b) rather than just CARD. The problem with the following regex:

(?!(?:CARDINALITY|DISCARD))CARD

is that with the case of DISCARD, when the scan is at the character position where CARD begins, we are past DIS and you would need a negative lookbehind condition to eliminate DISCARD from consideration. But when we are trying to match the complete word as we are in the regex I propose, we are still at the start of the word when we are applying the negative lookahead conditions.

Regex Demo (click on "RUN TESTS")

Booboo
  • 38,656
  • 3
  • 37
  • 60