How does .split(/_(.+)?/)[i] work?

Question

After finding this solution useful,

split string only on first instance of specified character

I'm confused at how this actually works. One top comment explains, "Just to be clear, the reason this solution works is because everything after the first _ is matched inside a capturing group, and gets added to the token list for that reason." - @Alan Moore

That doesn't make sense to me; what's a "capturing group"? Additionally, the author's positive-rated solution,

"good_luck_buddy".split(/_(.+)?/)[1]
"luck_buddy"

is being noted in the comments as having an improved method by omitting the question mark, ?,

split(/_(.+)/)

or omitting the question mark and replacing the plus sign, +, with an asterisk, *.

split(/_(.*)/)

Which is actually the best solution and why? Thank you.

Maybe this - http://stackoverflow.com/questions/18577704/why-capturing-group-results-in-double-matches-regex - is a better source. — Wiktor Stribiżew, Mar 18 '16 at 16:06
@WiktorStribiżew Thank you for the attention on this post. Please consider that while the answer may have been relatively similar to what I'm seeking with this question, the question, "Why capturing group results in double matches regex" is not the same as what I'm asking. Thank you! — 8protons, Mar 18 '16 at 16:08
There are a lot of these questions actually, just [search Google](https://www.google.pl/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=stackoverflow+If+separator+is+a+regular+expression+that+contains+capturing+parentheses) If you ask how `.*` works, your question is a dupe of [Learning Regular Expressions](http://stackoverflow.com/a/2759417/3832970). — Wiktor Stribiżew, Mar 18 '16 at 16:10
I still don't feel that the lengthy answer you shared is what I'm seeking. I'm looking for a concise, _specific_ solution to _exactly_ what I am asking. If you find that, let me know. Thanks! — 8protons, Mar 18 '16 at 16:24
I think that the answer to your *exact* question (How does .split(/_(.+)?/)[i] work?) can't be concise. You would need to understand what each component of that regex does, and there are quite a few things going on. Better to ask some of the components, which Google and SO can help you understand (e.g. "what is a regex capture group"). — Charlie Schliesser, Mar 18 '16 at 16:31

score 6 · Accepted Answer · edited May 23 '17 at 11:45

"good_luck_buddy".split(/_(.+)?/)

doesn't really make much sense. It's essentially the same as

"good_luck_buddy".split(/_(.*)/)

("match 1 or more, optionally" is the same as "match 0 or more").

The behaviour of regex.split in most languages is "take pieces of string that do not match":

"a_@b_@c".split(/_@/) => ["a", "b", "c"]

If the split expression contains capturing groups (...), these are also included in the resulting list:

"a_@b_@c".split(/_(@)/) => ["a", "@", "b", "@", "c"]

So the above code

"good_luck_buddy".split(/_(.*)/)

works as follows:

it finds the first piece in the string that doesn't match _(.*). This is good.
it finds a piece that does match _(.*). This is _luck_buddy. Since there's a capturing group, its content (luck_buddy) is also included in the output
finally, it finds the next piece that doesn't match _(.*). This is an empty string, and it's added to the output, so the output becomes ["good", "luck_buddy", ""]

To address the "what's the best" part, I'd use the second voted solution for a literal splitter:

result = str.slice(str.indexOf('_') + 1)

and .replace for a regex splitter:

result = str.replace(/.*?<regex>/, '')

score 0 · Answer 2 · answered Mar 18 '16 at 16:34

I'm not going to explain how basic RegEx works ("what is a capture group" ...). But to answer your question "which is best and why": It's just a matter of performance. Different regexes result in different processing times in the regex processor.

See this jsperf comparision: http://jsperf.com/regex-split-on-first-occurence-of-char

I tested IE11, FF and Chrome. There is not really a noticable difference between the three regex variants in this case.

score 0 · Answer 3 · edited May 23 '17 at 11:45

No need for a regular expression. Just find the index of the '_' (underscores) and get the substring.

function head(str, pattern) {
  var index = str.indexOf(pattern);
  return index > -1 ? str.substring(0, index) : '';
}

function tail(str, pattern) {
  var index = str.indexOf(pattern);
  return index > -1 ? str.substr(index + 1) : '';
}

function foot(str, pattern) {                              // Made this one up...
  var index = str.lastIndexOf(pattern);
  return index > -1 ? str.substr(index + 1) : '';
}

var str = "good_luck_buddy";
var pattern = '_';

document.body.innerHTML  = head(str, pattern) + '<br />';
document.body.innerHTML += tail(str, pattern) + '<br />';
document.body.innerHTML += foot(str, pattern);

If you want to find the index of a pattern (regex) in a string, this question will show you the way:

Polyfill for String.prototype.regexIndexOf(regex, startpos)

How does .split(/_(.+)?/)[i] work?

3 Answers3

Polyfill for String.prototype.regexIndexOf(regex, startpos)