0

I was looking for a single regular expression that could match everything that does NOT contain a given set of substrings.

For example, a regular expression that will match everything that does not contain the substrings "abc", "def", ghi"

In this example, the regex would match against "student", "apple" and "maria", but would not match against "definition", "ghint" or "abc123"

Thanks in advance

  • How about learning how regex work and try make it before asking here? – Jerry Oct 10 '13 at 20:36
  • [Negative lookaround](http://www.regular-expressions.info/lookaround.html) will do. – Terry Li Oct 10 '13 at 20:40
  • Why do you want to match everything that does not contain the substring if you want to perform the match on a single string like "apple". There are only 2 choices, false or true on the match. Do you just want to match a single string like "apple" or do you want to match it again substrings like "Judy wants an apple" ? That would significantly change the approach. – Sedecimdies Oct 11 '13 at 07:01
  • That can be done, but it is a mess. Why don't just try to match regularly, and invert the answer? – vonbrand Mar 01 '14 at 22:43

3 Answers3

1

That's what you use a negative lookahead assertion for:

^(?!.*(abc|def|ghi))

will match as long as the input string doesn't contain any of the "bad" words.

Note that the lookahead assertion itself doesn't match anything, so the match result (in the case of a successful match) will be an empty string.

In Python:

>>> regex =  re.compile("^(?!.*(abc|def|ghi))")
>>> [bool(regex.match(s)) for s in ("student", "apple", "maria",
...                                 "definition", "ghint", "abc123")]
[True, True, True, False, False, False]
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
1

You can use lookaheads:

^(?!.*?(?:abc|def|ghi)).*$
  • (?!...) is called negative lookahead
  • (?:...) is called non capturing group.

Regex Reference

anubhava
  • 761,203
  • 64
  • 569
  • 643
0

If you have a string containing the "forbidden" words like below "

student apple maria definition ghint abc123 righit

and you just want to know if the string contains them you can use :

.*?(?!def|abc|ghi)

This will give you 4 matches

  • d
  • g
  • a
  • g

that are the first letters of the forbidden words ( *def*inition, *ghi*nt, *abc*123, ri*ghi*t )

If no matches are found in your string, there are no "forbidden" words.

you can also use a regex.replace using :

\w*(abc|def|ghi)\w*

that replaces your "forbidden" substring with "" allowing you to retain all non-forbidden substrings.

Sedecimdies
  • 152
  • 1
  • 10