4

There are a lot of questions about lookbehind, But I couldn't find my answer.

My RegExp with negative lookbehind is working fine in chrome but not in IE.

I need a regular expression that match any word after a period(.) but there should not be a (Mr) before the period. for example in

'I met Mr. Jack this evening. He is a good man'

I want to get He, but not Jack (since it is followed by Mr.)

so far I have comeup with the following regExp which works fine on chrome, But it doesn't on IE. and It is to be run on IE.

/(?<!Mr)\. *\b\w+\b/gi

Now I need an alternative to this regex which works on IE. I will also need to exempt (Miss. Mrs. Dr.) after this.

Poul Bak
  • 10,450
  • 5
  • 32
  • 57
Iqbal
  • 219
  • 1
  • 10
  • Possible duplicate of [Javascript: negative lookbehind equivalent?](https://stackoverflow.com/questions/641407/javascript-negative-lookbehind-equivalent) – Tom Lord Nov 05 '18 at 18:02
  • For the record, it ONLY Works in Chrome, all other browsers will fail a look behind. – Poul Bak Nov 05 '18 at 18:54

4 Answers4

1

One workaround to achieve this would be by reversing the string, then using negative look-aheads (which are supported by all browsers) - see: https://stackoverflow.com/a/11347100/1954610

Alternatively, you can use negative lookaheads on the existing string too - but it's a bit awkward. Here's a solution for only excluding Mr:

/((?!Mr).{2}|^.?)\. *\b\w+\b/gi

In particular, note the edge cases I had to cover here: The match can happen after 0-1 characters, or after 2 characters that were not "Mr".

Extending this to include Dr is quite easy:

/((?![MD]r).{2}|^.?)\. *\b\w+\b/gi

However, extending this to include Mrs and Miss is much harder - since you now need to account for different length look-aheads. Such regex would end up very confusing. Here's my best attempt, but I'm not entirely convinced it covers all edge cases. (Maybe if someone can cross check it??...)

/(^.?|(?!Miss)(^|.)(?!Mrs)(^|.)(?![MD]r).{2})\. *\b\w+\b/

Demo

...Or alternatively, admittedly as a very ugly workaround, here's a regex to test the string backwards:

\b\w+\b *\.(?!(rM|rD|srM|ssiM))

Demo

Tom Lord
  • 27,404
  • 4
  • 50
  • 77
  • 1
    does not work for words that end in with the substring (dismiss). Adding `\b` to the lookahead fixes this. https://regex101.com/r/mG2mDJ/2 – doom87er Nov 05 '18 at 21:28
  • 1
    @doom87er You're right, and that's probably an improvement. However, the behaviour of my regex is actually consistent with what OP originally implemented; there was no word boundary before the look-behind. – Tom Lord Nov 05 '18 at 21:32
1

You could make use of the ability to use capturing groups with this pattern:

bad_sequence|(good_sequence)

We do actually match the bad stuff, but we only "remember" the valid results by virtue of the capturing parentheses around the second part of the alternation.

so it becomes simply this (note how we use 'grouping only' parens in the first part):

(?:Mr|Mrs|Miss|Dr)\.\s*|\.\s*(\w+)

your "valid words coming after a period", are now in Group 1.

Scott Weaver
  • 7,192
  • 2
  • 31
  • 43
1

DEMO

(?!(?:Miss|Mr|Dr)\.)(?:\b\w+\b)(\. *\b\w+\b)

Input:

I met Mr. Jack this evening. He is a good man. And Miss. Jack is a good woman. Dr. Jack, how ever is not that great

Output:

. He
. And
. Dr

Fortunately, IE does support negative look ahead. Expanding your pattern \. *\b\w+\b to match both the word before and after the . allows you to negate the match with the look ahead, and capture the second part.

doom87er
  • 458
  • 2
  • 8
  • 1
    This fails for input such as `Drink. Me` - since you require the letters before the period to start with a word boundary, but they also cannot begin with `dr` – Tom Lord Nov 05 '18 at 21:14
  • 1
    It also has different behaviour input like `ASMR. Something` - since you consider that a match, but a negative look-behind would not. – Tom Lord Nov 05 '18 at 21:18
  • 1
    @TomLord thanks, edited post to fix that. I'm assuming that matching `ASMR. Something` is intended behavior. if not, change `(?!(?:Miss|Mr|Dr)\.)` too: `(?!\b(?:Miss|Mr|Dr)\b)` – doom87er Nov 05 '18 at 21:40
  • This also (but again, debatable whether it's the correct behaviour!) fails when there is *no word* immediately before the period. For example, for the input: `This... Is an edge case` https://regex101.com/r/23hptX/4 – Tom Lord Nov 05 '18 at 22:01
0

I would do this in two steps. Step 1, match the unwanted Words, then replace them with an empty string, then the string is ready to be parsed for dots. Here's the first regex:

/(?:Mr|Mrs|Miss|Dr)\./gi

Now replace those matches with an empty string.

Now match the fixed string with this regex:

/\s*\b\w+\b/gi

That will give the result you want.

Poul Bak
  • 10,450
  • 5
  • 32
  • 57
  • This fails, for example, for the input: `Example. Mr. Foo`. The expected result should be a match for `Mr`, but with your modification the match becomes `Foo`. – Tom Lord Nov 05 '18 at 19:41