How to exlude only certain words in regex

Question

I want to alternate my regular expression. This current expression allows users to input anything as long as it doesn't contain word 'white' or combination of words "cat" and "dog" (either "cat" or "dog" separately is a).

regex before change:
/^((?!(white|cat.*dog|dog.*cat))[\s\S])*$/i

Is it possible to alternate this regex so inputs like "A white tiger" are valid, but sole word ("white) is not?

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

Solution

What you need to do is to make it more efficient by making the lookahead run once at the beginning:

/^(?!white$|[\s\S]*(?:cat[\s\S]*dog|dog[\s\S]*cat))[\s\S]*$/i

See the regex demo ([\s\S] replaced with . only for demo since the input is tested line by line).

Explanation

The /^((?!(white|cat.*dog|dog.*cat))[\s\S])*$/i contains an anchored tempered greedy token from a well-known Regular expression to match line that doesn't contain a word? post. [\s\S] matches any character (even a newline) but not in case it is a char that is the first in a sequence defined in the negative lookahead. So, the regex above matches any string but the one that contains either white, or cat followed with 0+ chars other than a newline and then dog, or vice versa, dog and then after 0+ chars other than a newline, cat.

So, what is necessary is to make sure white is tested in between anchors: ^(?!white$)[\s\S]*$ will do that check.

The rest of the alternatives still need to be checked inside, at any location within the string. So, the [\s\S]* should be put before the (?:cat[\s\S]*dog|dog[\s\S]*cat) group: [\s\S]*(?:cat[\s\S]*dog|dog[\s\S]*cat). That way, we make sure the string does not have these patterns inside. Note the .* in the lookahead only checked if the patterns were not present on the first line.

Details:

^ - start of string
(?! - the negative lookahead check:
- white$ - the string can't equal white
- | - or
- [\s\S]*(?:cat[\s\S]*dog|dog[\s\S]*cat) - 0+ any chars followed with either cat and then after any number of chars a dog or vice versa
) - end of the lookahead
[\s\S]* - 0+ any chars
$ - end of string.

score 0 · Answer 2 · answered Aug 08 '16 at 15:19

You need to use anchor $ after your negative lookahead:

/^(?!(white|cat.*dog|dog.*cat)$)[\s\S]*$/gmi
//                   here >---^

This will disallow just white but will allow A white tiger

RegEx Demo

PS: In Javascript you can also use [^] instead of [\s\S] i.e.

/^(?!(white|cat.*dog|dog.*cat)$)[^]*$/gmi

How to exlude only certain words in regex

2 Answers2

Solution

Explanation