Find all words/substrings that don't match regex?

Question

So I want to find all of the parts of the string that don't aren't in the regex.

Let's say I have a regex r'foo|bar' and string 'Hello foo how are you bar', how can I get every word other than what the regex matches so it returns ['Hello', 'how', 'are', 'you']?

Is it always words that you need to exclude? If yes I would say regex is little overkill. — Austin, Mar 09 '20 at 18:20
I'd probably start be trying to get a collection of all of the "words" (not always as easy as you think... what do you do with `well-respected`?) Then compare your list of words against the list of words you want to exclude. — JDB, Mar 09 '20 at 18:28

dawg · Answer 1 · 2020-03-09T18:25:31.363

You can use a list comprehension and negate the regex match:

>>> st='Hello foo how are you bar'
>>> [w for w in st.split() if not re.search(r'foo|bar', w)]
['Hello', 'how', 'are', 'you']

You did not ask, but you would likely want to use anchors in your regex so that if you had foofoo or barfoo or fooblulator in the list it is handled as you expect.

And if you just have simple word look ups, not requiring a regex, the same method works:

>>> [w for w in st.split() if w not in {'foo', 'bar'}]
['Hello', 'how', 'are', 'you']

score 1 · Answer 2 · answered Mar 09 '20 at 18:28

1

Very similar to @dawg's answer. But you can have negative look ahead in regex

st='Hello foo how are you bar'
[w for w in st.split() if re.search(r'^(?!(foo|bar))', w)] # ['Hello', 'how', 'are', 'you']

answered Mar 09 '20 at 18:28

mad_

8,121
2
25
40

score 0 · Answer 3 · answered Mar 09 '20 at 18:41

This should do it:

\b(?!Foo\b|bar\b)[A-Za-z]+

Demo

We have:

\b         # match a word break
(?!        # begin a negative lookahead
  Foo\b    # match 'Foo' followed by a word break
  |        # or
  bar\b    # match 'bar' followed by a word break
)          # end negative lookahead
[A-Za-z]+  # match 1+ letters

Find all words/substrings that *don't* match regex?

3 Answers3

Find all words/substrings that don't match regex?