0

I have seen this answer proposed in this question

However the resulting match is not the same. When the match is at the beginning of the string the string is returned, however when matched after a whitespace the whitespace is also returned as part of the match; even though the non-capture colon is used.

I tested with the following code is Firefox console:

let str1 = "un ejemplo";
let str2 = "ejemplo uno";
let reg = /(?:^|\s)un/gi;
console.log(str1.match(reg)); // ["un"]
console.log(str2.match(reg)); // [" un"]

Why is the whitespace being returned?

Robin Mackenzie
  • 18,801
  • 7
  • 38
  • 56
Javier Mr
  • 2,130
  • 4
  • 31
  • 39

2 Answers2

3

The colon in (?:^|\s) just means that it's a non-capturing group. In other words, when reading, back-referencing, or replacing with the captured group values, it will not be included. Without the colon, it would be reference-able as \1, but with the colon, there is no way to reference it. However, non-capturing groups are by default still included in the match. For instance My (?:dog|cat) is sick will still include the word dog or cat in the match, even though it's a non-capturing group.

To make it exclude the value, you have two options. If your regex engine supports negative look-behinds, you can use on of those, such as (?!<^|\s). If it does not (and unfortunately, JavaScript's engine is one of the ones which does not), you could put a capturing group around just the part you want and then read that group's value rather than the whole match (e.g, (?:^|\s)(un)). For instance:

let reg = /(?:^|\s)(un)/gi;
let match = reg.exec(input)
let result = match[1];
Steven Doggart
  • 43,358
  • 8
  • 68
  • 105
  • I've executed: `reg2 = /(?:^|\s)(un)/gi; str2.match(reg2)` after the commands in my question and still get back the match with the whitespace – Javier Mr Sep 21 '17 at 11:27
  • `match` returns the full match. You need to read just the capturing-group's value. I added an example. – Steven Doggart Sep 21 '17 at 11:30
  • Oops. Sorry about the confusion. I forgot to change `string.match` to `reg.exec`. I updated my answer to correct it. – Steven Doggart Sep 21 '17 at 12:33
  • Thanks for the response. Unfortunately I'm also using this same regex in a replace call to s string, so when the whitespace is returned in the result the whitespace is removed by the replace (this didn't happen with the \b). And using your solution does not work with the replace. But is the answer to my original question. – Javier Mr Sep 21 '17 at 13:22
  • @JavierMr You can reference the value of any captured group in your replacement pattern by `$1`, `$2`, etc. In other words, if you change it so that it does capture the space, like `(^|\s)(un)`, then you could replace with `$1whatever`. – Steven Doggart Sep 21 '17 at 13:23
1

One solution would be to use a capturing group (ie. (un)) so that you can use RegExp.prototype.exec() and then use match[1] of this result to get the matched string, like this:

let str1 = "un ejemplo";
let str2 = "ejemplo uno";
let reg = /(?:^|\s)(un)/gi;
var match1 = reg.exec(str1);
var match2 = reg.exec(str2);
console.log(match1[1]); // ["un"]
console.log(match2[1]); // ["un"]
Angelos Chalaris
  • 6,611
  • 8
  • 49
  • 75