1

I had a task to split a string by a two chars groups.

So '031745'[03,17,45]

I took the regex approach and successfully managed to do it via:

'031745'.split(/(?=(?:..)+$)/);

// result: ["03", "17", "45"]

I'm aware of what's going on here: We are trying to split by invisible location which has subsequent groups of 2 chars, repeatedly.

But there are two things which I find hard to explain:

1.

  • If I remove the end char $, I get this result:

    '031745'.split(/(?=(?:..)+)/);
    // result: ["0", "3", "1", "7", "45"]`
    

    Why does removing $ affects the result? After all, we're just looking for repeated - non-overlapped two chars.

2.

  • Why does changing the inner group to a non-captured-group, causing to yield a different result:

    '031745'.split(/(?=(..)$)/);
    // result: ["0317", "45", "45"]
    

    AFAIK - captured groups are for back reference and for capturing a group. After all - it's still a group of two chars being repeated, so what makes (..) behaves differently than (?:..) in this particular case?

nb, I know there are other approaches but still I want to stay with Regex - learning purpose.

Martin Schneider
  • 14,263
  • 7
  • 55
  • 58
Royi Namir
  • 144,742
  • 138
  • 468
  • 792
  • sometimes `match` is shorter: `'031745'.match(/../g)`. – Nina Scholz Oct 11 '18 at 06:59
  • @NinaScholz Yes. indeed. it's just that splitting by regex is also an option , and I got hit by it - twice! and I don't like to get hit without knowing why. – Royi Namir Oct 11 '18 at 07:02
  • the end sign is necessary to prevent splitting each character, because only the last two gets treated together. for all previous characters, the grouping of two is not visible for the regex. (taht makes matching easier, because of the defined length of the group). – Nina Scholz Oct 11 '18 at 07:05
  • @wiktor ... Oh come on - do you really think that the dup for `$` is the same ? I perfectly know what `$` means. Read the dup answer and tell me if it helps in this^ case of split. – Royi Namir Oct 11 '18 at 07:09
  • These are basic regex issues, both are covered in the attached links. – Wiktor Stribiżew Oct 11 '18 at 07:13
  • 1
    @WiktorStribiżew I agree that those are basic operators. but in this situation I've found it hard to explain how come `$` ( the end char) affects the result. No need to downvote becuase of a different opinion. There is nothing wrong with the question as a question. – Royi Namir Oct 11 '18 at 07:23
  • 1
    @WiktorStribiżew if you look [at this](https://i.imgur.com/qGOFen3.jpg) you will see that i'm pretty know what those operators do. Saying that you _think_ i'm asking "a tutorial question" is rude. Just that it's a simple answer to you , doesn't mean it's a simple answer for me. You can't say to other people that their questions are tutorial level just becuase it's easy for you. This is not the SO spirit. – Royi Namir Oct 11 '18 at 07:35

1 Answers1

2

Why does removing $ affects the result ?

The $ ensures that the end of the string occurs after some number of repetitions of two characters. Otherwise, the locations that are split on will be any location after which there are at least two characters - which is every location (except just before the end of the string). So, the $ is required to chunk the string properly. When there are an odd number of characters between some position and the end of the string, you want the regex to fail, so that (for example) characters 0 and 1 are not split apart, and characters 2 and 3 are not split apart, and so on.

Why does changing the inner group to a non-captured-group , causing to yield a different result

When you use a capture group inside of split, whatever is captured will be included in the resulting array as an additional item, in addition to the part of the string that is split before and after. For example:

console.log('foobar'.split(/(bar)/));

Here, the string is split on bar. Without a capturing group, it would result in ['foo', '']:

console.log('foobar'.split(/(?:bar)/));

But because bar was captured, it's added in between. The same thing is occuring in your

'031745'.split(/(?=(..)$)/);

Because the final 45 is captured, it gets included in the result due to being in a capture group, but because the 45 is inside a lookahead, it hasn't been consumed in the split. So, the 45 is included in the result again due to the string being split at the location between 0317 and 45.

[
  "0317", // Initial portion of the string
  "45", // Captured group
  "45" // Final portion of the string
]
CertainPerformance
  • 356,069
  • 52
  • 309
  • 320