0

So I want to find all of the parts of the string that don't aren't in the regex.

Let's say I have a regex r'foo|bar' and string 'Hello foo how are you bar', how can I get every word other than what the regex matches so it returns ['Hello', 'how', 'are', 'you']?

martineau
  • 119,623
  • 25
  • 170
  • 301
ontley
  • 91
  • 2
  • 8
  • Is it always words that you need to exclude? If yes I would say regex is little overkill. – Austin Mar 09 '20 at 18:20
  • I'd probably start be trying to get a collection of all of the "words" (not always as easy as you think... what do you do with `well-respected`?) Then compare your list of words against the list of words you want to exclude. – JDB Mar 09 '20 at 18:28

3 Answers3

2

You can use a list comprehension and negate the regex match:

>>> st='Hello foo how are you bar'
>>> [w for w in st.split() if not re.search(r'foo|bar', w)]
['Hello', 'how', 'are', 'you']

You did not ask, but you would likely want to use anchors in your regex so that if you had foofoo or barfoo or fooblulator in the list it is handled as you expect.

And if you just have simple word look ups, not requiring a regex, the same method works:

>>> [w for w in st.split() if w not in {'foo', 'bar'}]
['Hello', 'how', 'are', 'you']
dawg
  • 98,345
  • 23
  • 131
  • 206
1

Very similar to @dawg's answer. But you can have negative look ahead in regex

st='Hello foo how are you bar'
[w for w in st.split() if re.search(r'^(?!(foo|bar))', w)] # ['Hello', 'how', 'are', 'you']
mad_
  • 8,121
  • 2
  • 25
  • 40
0

This should do it:

\b(?!Foo\b|bar\b)[A-Za-z]+

Demo

We have:

\b         # match a word break
(?!        # begin a negative lookahead
  Foo\b    # match 'Foo' followed by a word break
  |        # or
  bar\b    # match 'bar' followed by a word break
)          # end negative lookahead
[A-Za-z]+  # match 1+ letters
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100