0

I have vector of strings. I want to match only those, that contain "features." (with optional dot). The matched strings should not contain "ONLY" and "PRACA".

feature.Agencyjna_ONLY # needs to be removed
feature.UoP # needs to be matched
feature.UoD # needs to be matched
feature.PRACA # needs to be removed
featurePRACA # needs to be removed

I used two following constructions, but both of them do not work.

\bfeature\.?[^(ONLY|PRACA)]+\b
\bfeature\.?\w+(?!(ONLY|PRACA))\b

Desired output:

feature.UoP 
feature.UoD

Example: regex101

Would be appreciated for any help!

jeparoff
  • 166
  • 8

1 Answers1

2

The lookahead should be at the start of the string to make sure that it does not occur in the whole string, and then you can match feature with an optional dot.

Using this [^(ONLY|PRACA)] in a pattern means a negated character class, and can also be written as [^()CAONLYPR|] matching any char except what is listed (so the | is a pipe char, and does not mean OR)

^(?!.*(?:ONLY|PRACA)).*\bfeature\.?\w+
  • ^ Start of string
  • (?! Negative lookahead, assert what is at the right is not
    • .* Match any char except a newline 0+ times
    • (?:ONLY|PRACA) match either ONLY or PRACA
  • ) Close the negative lookahead
  • .*\bfeature\.? Match as much chars as possible, then match feature and an optional dot
  • \w+ Match 1+ word chars

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • Thank you for the answer, it works, but I would be appreciated if you explain how the first part of expression works. I have some difficulties with possessive matching, so I do not understand how (?:ONLY|PRACA) works in pair with ?! – jeparoff May 17 '21 at 11:58
  • 1
    @HermanCherniaiev The `(?!` is an assertion. You place it right after the anchor `^` that asserts the start of the string to have it run only once. If you do not "anchor" it, the assertion might run multiple times in the pattern, yielding unwanted results, because there are positions in the string where the assertion can become true. See an example of unwanted matches where the assertion is not anchored at the start of the string: https://regex101.com/r/neKQpr/1 – The fourth bird May 17 '21 at 12:02