4

I've got this regex: ([a-z]+)(?:\.).

Using it in Javascript like this:
"test.thing.other".split(/([a-z]+)(?:\.)/); ends up giving me an array like this:
["", "test", "", "thing", "other"].
I have no idea why the first and third elements are being put into that array. Can anyone show me what I'm doing wrong?

James P. Wright
  • 8,991
  • 23
  • 79
  • 142

2 Answers2

3

Based on your question and comment "Capture a-z until a period" I believe you should be rather using String.match like this:

arr = "test.thing.other".match(/[a-z]+(?=\.)/g);

gives:

["test", "thing"]
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • You are correct. When "split" starting giving me results closer to what I wanted I started using that. I now see that I shouldn't. – James P. Wright Mar 11 '13 at 16:13
  • Good answer – but an explanation would be nice, in particular since your expression is using zero-width lookahead assertions which aren’t widely known. – Konrad Rudolph Mar 11 '13 at 16:18
  • @KonradRudolph: Thanks for the nice words. My answer was based on my understanding of OP's question and comments. OP mentioned that he wants to `Capture a-z until a period"` which in other words means he wants to capture `[a-z]+` if it is followed by a period. That requirements is easily met by using a positive lookahead `(?=\.)` as I used in my answer. – anubhava Mar 11 '13 at 16:24
2

The parentheses are the reason. Here’s what MDN says on string.split says:

If separator is a regular expression that contains capturing parentheses, then each time separator is matched the results (including any undefined results) of the capturing parentheses are spliced into the output array.

They also caution:

However, not all browsers support this capability.

So this result may be inconsistent. If you just want to split by the content of the expression, remove the parentheses:

>> 'test.thing.other'.split(/[a-z]+\./)
["", "", "other"]

Which may also not be what you want, but is the intuitively expected result given your expression.

If you want to split by dot then you need to provide exactly that in the regular expression: a dot.

>> 'test.thing.other'.split(/\./)
["test", "thing", "other"]
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • @0A0D you are testing the *matcher*, not the output of `split`. These methods behave very differently but if you check the “global” option on your test page you’ll nevertheless get a consistent result. – Konrad Rudolph Mar 11 '13 at 16:04
  • Hmm, its an odd syntax nonetheless. I wonder if this applies: http://stackoverflow.com/a/812179/195488 –  Mar 11 '13 at 16:06
  • @KonradRudolph : That ends up nowhere near capturing what I want. My expression says: "Capture a-z until a period", which is exactly what I want. I still don't see how that ends up returning two blank instances based on the given string. – James P. Wright Mar 11 '13 at 16:06
  • @KonradRudolph: Yes, global gets you two matches. You would think that it would return the matches.. what else is it splitting really. Its an odd way to do a split. –  Mar 11 '13 at 16:07
  • I think I see the mistake... you have to match on the period. Split does return the matches, in a sense. –  Mar 11 '13 at 16:09
  • @0A0D Only if you use capturing groups in the expression, otherwise `split` does exactly what you’d normally expect from a splitting function, i.e. it uses the expression as the separator. – Konrad Rudolph Mar 11 '13 at 16:20
  • @KonradRudolph: I see, its still weird. Use `match` instead. –  Mar 11 '13 at 16:21
  • @James Well your question said “I have no idea why the first and third elements are being put into that array” which implies that your expected result was equal to my last example. – Konrad Rudolph Mar 11 '13 at 16:21
  • @KonradRudolph : Check my result again. The first and third items are blank strings. :P – James P. Wright Mar 11 '13 at 16:24
  • @James Exactly: and if you take those out (because they were unexpected) you get, as the remainder, `["test", "string", "other"]` which is the result from my expression. – Konrad Rudolph Mar 11 '13 at 16:25