1

I'm wanting to extract each block of alphanumeric characters that come after underscores in a Javascript string. I currently have it working using a combination of string methods and regex like so:

var string = "ignore_firstMatch_match2_thirdMatch";    
var firstValGone = string.substr(string.indexOf('_'));
// returns "_firstMatch_match2_thirdMatch"
var noUnderscore = firstValGone.match(/[^_]+/g);
// returns ["firstMatch", "match2" , "thirdMatch"]

I'm wondering if there's a way to do it purely using regex? Best I've managed is:

var string = "ignore_firstMatch_match2_thirdMatch";
var matchTry = string.match(/_[^_]+/g);
// returns ["_firstMatch", "_match2", "_thirdMatch"]

but that returns the preceding underscore too. Given you can't use lookbehinds in JS I don't know how to match the characters after, but exclude the underscore itself. Is this possible?

chrBrd
  • 603
  • 5
  • 23

3 Answers3

2

Since lookbehind is not supported in JS the only way I can think of is using a group like this.

Regex: _([^_]+) and capture group using \1 or $1.

Regex101 Demo

var myString = "ignore_firstMatch_match2_thirdMatch";
var myRegexp = /_([^_]+)/g;

match = myRegexp.exec(myString);
while (match != null) {
  document.getElementById("match").innerHTML += "<br>" + match[0];
  match = myRegexp.exec(myString);
}
<div id="match">

</div>

An alternate way using lookahead would be something like this.

But it takes long in JS. Killed my page thrice. Would make a good ReDoS exploit

Regex: (?=_([A-Za-z0-9]+)) and capture groups using \1 or $1.

Regex101 Demo

2

You can use a capture group (_([^_]+)) and use RegExp#exec in a loop while pushing the captured values into an array:

var re = /_([^_]+)/g; 
var str = 'ignore_firstMatch_match2_thirdMatch';
var res = [];
 
while ((m = re.exec(str)) !== null) {
    res.push(m[1]);
}
document.body.innerHTML = "<pre>" + JSON.stringify(res, 0, 4) + "</pre>";

Note that using a string#match() with a regex defined with a global modifier /g will lose all the captured texts, that's why you cannot just use str.match(/_([^_]+)/g).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

Why do you assume you need regex? a simple split will do the job:

string str = "ignore_firstMatch_match2_thirdMatch";
IEnumerable<string> matches = str.Split('_').Skip(1);
maraaaaaaaa
  • 7,749
  • 2
  • 22
  • 37