Do not match if word appears in regex

Question

I have a url, and I want it to NOT match if the word 'season' is contained in the url. Here are two examples:

CONTAINS SEASON, DO NOT MATCH
'http://imdb.com/title/tt0285331/episodes?this=1&season=7&ref_=tt_eps_sn_7'

DOES NOT CONTAIN SEASON, MATCH
'http://imdb.com/title/tt0285331/

Here is what I have so far, but I'm afraid the .+ will match everything until the end. What would be the correct regex to use here?

r'http://imdb.com/title/tt(\d)+/.+^[season].+'

what is wrong with [`if word in mystring:`](http://stackoverflow.com/a/5319942/1959948)? — Dalorzo, Aug 22 '14 at 22:12
Look-ahead with `(?=.*season)` to detect it (or `(?!.*season)` to ensure it doesn't exist). — OnlineCop, Aug 22 '14 at 22:13
@Dalorzo: the only difference i see is that this answer doesnt't take account of the word boundaries. — Casimir et Hippolyte, Aug 22 '14 at 22:14

score 2 · Accepted Answer · answered Aug 22 '14 at 22:13

Use a negative lookahead:

urls='''\
http://imdb.com/title/tt0285331/episodes?this=1&season=7&ref_=tt_eps_sn_7
http://imdb.com/title/tt0285331/'''

import re

print re.findall(r'^(?!.*\bseason\b)(.*)', urls, re.M)
# ['http://imdb.com/title/tt0285331/']

hwnd · Answer 2 · 2014-08-22T22:20:07.487

You cannot use whole words inside of character classes, you have to use a Negative Lookahead.

>>> s = '''
http://imdb.com/title/tt0285331/episodes?this=1&season=7&ref_=tt_eps_sn_7
http://imdb.com/title/tt0285331/
http://imdb.com/title/tt1111111/episodes?this=2
http://imdb.com/title/tt0123456/episodes?this=1&season=1&ref_=tt_eps_sn_1'''
>>> import re
>>> re.findall(r'\bhttp://imdb.com/title/tt(?!\S+\bseason)\S+', s)
# ['http://imdb.com/title/tt0285331/', 'http://imdb.com/title/tt0285331/episodes?this=2']

Avinash Raj · Answer 3 · 2014-08-22T23:37:43.817

2

Use a negative lokahead just after to tt\d+/,

>>> import re
>>> s = """http://imdb.com/title/tt0285331/episodes?this=1&season=7&ref_=tt_eps_sn_7
... http://imdb.com/title/tt0285331/
... """
>>> m = re.findall(r'^http://imdb.com/title/tt\d+/(?:(?!season).)*$', s, re.M)
>>> for i in m:
...     print i
... 
http://imdb.com/title/tt0285331/

edited Aug 22 '14 at 23:37

answered Aug 22 '14 at 22:18

Avinash Raj

172,303
28
230
274

The `*` already insures that it can match even if there's nothing after the final slash. Wrapping the last part of the regex in a group and making it optional serves no purpose. – Alan Moore Aug 22 '14 at 23:34

Do not match if word appears in regex

3 Answers3