-2

I want to grab phrases that say good or great, but are not negated by the words not or isn't before it.

sents= ["good words",                   # Words after phrase
        "not good words",
        "isn't good words",

        "great words",
        "not great words",
        "isn't great words",



        "words good",                   # Words before phrase
        "words not good",
        "words isn't good"

        "words great",
        "words not great",
        "words isn't great"


        
        "words good words",             # Words before and after phrase
        "words not good words",
        "words isn't good words",

        "words great words",
        "words not great words",
        "words isn't great words",
]

I want to return

good words
words good
words good words

great words
words great
words great words

What is the regular expression that will let me do this? In theory, I want to be able to have a list of words that are only found if the string does not contain any from a list of negatives precede it.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Desi Pilla
  • 544
  • 6
  • 20

2 Answers2

2

You may use this regex with 2 negative lookbehind assertions in python:

(?<!isn't )(?<!not )\b(?:good|great)\b

RegEx Demo

RegEx Details:

  • (?<!isn't ): Negative lookbehind to fail the match if we have isn't followed by a single space behind us
  • (?<!not ): Negative lookbehind to fail the match if we have not followed by a single space behind us
  • \b: Word boundary
  • (?:good|great): Match good or great
  • \b: Word boundary

Code:

>>> sents= ["good words",                   # Words after phrase
...         "not good words",
...         "isn't good words",
...         "great words",
...         "not great words",
...         "isn't great words",
...         "words good",                   # Words before phrase
...         "words not good",
...         "words isn't good",
...         "words great",
...         "words not great",
...         "words isn't great",
...         "words good words",             # Words before and after phrase
...         "words not good words",
...         "words isn't good words",
...         "words great words",
...         "words not great words",
...         "words isn't great words",
... ]
>>> reg = re.compile(r"(?<!isn't )(?<!not )\b(?:good|great)\b")
>>> for s in sents:
...     if reg.search(s):
...             print(s)
...
good words
great words
words good
words great
words good words
words great words
anubhava
  • 761,203
  • 64
  • 569
  • 643
0

You need to use look behind, in this case negative since there is a version for positive as well. And you can use it as simple as:

(?<!not\s)great

In this example the word not cannot exists before great.

Here is a how it may look like:

(?<!not\s)(?<!isn't\s)(great|good)

Online Demo

Dalorzo
  • 19,834
  • 7
  • 55
  • 102