Don't match if character plus space is present before a specific word

Question

This regex matches "so" when it's preceded by a comma and a space:

(,\s)(so)

I want to do the opposite now. I want to tell the regex: don't match "so" if it's preceded by a comma and a space. I tried this (after seeing this SO question):

^(,\s)(so)

But now the regex doesn't match anything: https://regexr.com/4kgq8.

Note: I'm not trying to match beginning of the line.

You may use negative lookahead `/(?<!, )so/g` but it is only supported in modern browsers. — anubhava, Sep 06 '19 at 05:26

Emma · Answer 1 · 2019-09-06T05:47:07.270

2

Or you can simply use alternation, if that'd be OK, with an expression without lookarounds such as:

, so|(\bso\b)

const regex = /, so|(\bso\b)/gmi;
const str = `So that 
so big that 
, so
and not , so
, so so`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

If you wish to explore/simplify/modify the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.

edited Sep 06 '19 at 05:47

answered Sep 06 '19 at 05:27

Emma

27,428
11
44
69

1

OP is not trying to match start-of-line, they were (incorrectly) guessing at a negation operator (by analogy from character class negation). – Amadan Sep 06 '19 at 05:29
1

This is not correct as it will fail to match 2nd `so` in `, so so` – anubhava Sep 06 '19 at 05:35
1

Sorry, but I'm using Atom's regex search ... not a JavaScript program. – alexchenco Sep 06 '19 at 05:46
1

...Then why tag with [tag:javascript] and not [tag:atom-editor]?!? – Amadan Sep 06 '19 at 05:48

Amadan · Answer 2 · 2019-09-06T05:51:56.687

As anubhava mentions in the comment, with negative lookbehind you could do /(?<!,\s)(so)/, which would match so that is not preceded by a comma and a space (and capturing so). This is a reverse from /(?<=,\s)(so)/, which matches so that is preceded by a comma and a space.

Your regexp /(,\s)(so)/ matches a comma, a space and so (and captures the comma and the space in one group, and so in another). The negation of that can be constructed using a negative lookahead, supported in all browsers, like so: /((?!,\s)..|^.?)(so)/ — it will match two characters (or less, if at the start of the string) that are not a comma and a space, then so (and capture both the non-comma-space preceding characters, and so).

Typically, this second approach has a drawback: when you match more than you want, the restriction against overlapping matches might make you lose a match here and there. However, in this particular case, it is not a problem, since any overlapping characters would be so, not a comma and a space.

(EDIT: I wrote "space" here but all the expressions are written following OP's use of \s, which is actually for whitespace, which includes more than just space.)

anubhava · Accepted Answer · 2019-09-06T05:54:57.127

1

I want to tell the regex: don't match "so" if it's preceded by a comma and a space.

Best solution is to use a negative lookbehind as I mentioned in my comment below question:

/(?<!, )so/g

Here, (?<!, ) is a negative lookbehind expression that fails the match if a comma and space is present before so.

RegEx Demo 1

Caveat is that lookbehind support in Javascript is only available in modern browsers.

If you want to support legacy or older browsers also then approach will be to use a captured group and discard unwanted match in alternation:

/(?:, so|(\bso))\b/g

RegEx Demo 2

Here a match is defined by presence of capture group #1 in each match. We are matching and discarding unwanted match of ", so in left hand side of alternation. Our desired matches string is on right hand side of alternation which is captured in group #1.

Code:

var arr = ['So that', 
'so big that', 
', so so'];

const regex = /(?:, so|(\bso))\b/g;

arr.forEach((el) => {
  m = regex.exec(el);
  if (m && m.length > 1)
    console.log('Line; [', el, ' ] Start:', regex.lastIndex, m[1])
});

edited Sep 06 '19 at 05:54

answered Sep 06 '19 at 05:45

anubhava

761,203
64
569
643

Thanks, but I'm using Atom's regex search feature, so I can't choose groups. – alexchenco Sep 06 '19 at 05:56
If lookbehind is supported in Atom's regex engine then that will be best solution. – anubhava Sep 06 '19 at 05:57
1

Oh, I updated Aom and the negative look behind seems to work now. Thanks. – alexchenco Sep 06 '19 at 06:02
Great, negative lookbehind is best possible solution for this requirement. – anubhava Sep 06 '19 at 06:03

Don't match if character plus space is present before a specific word

3 Answers3