12

Let's say I have a string: "We.need..to...split.asap". What I would like to do is to split the string by the delimiter ., but I only wish to split by the first . and include any recurring .s in the succeeding token.

Expected output:

["We", "need", ".to", "..split", "asap"]

In other languages, I know that this is possible with a look-behind /(?<!\.)\./ but Javascript unfortunately does not support such a feature.

I am curious to see your answers to this question. Perhaps there is a clever use of look-aheads that presently evades me?

I was considering reversing the string, then re-reversing the tokens, but that seems like too much work for what I am after... plus controversy: How do you reverse a string in place in JavaScript?

Thanks for the help!

Community
  • 1
  • 1
DRAB
  • 445
  • 2
  • 10
  • 1
    `"We.need..to...split.asap".split(/\b\./)`, but this only works if the first `.` is preceded by a word character. – nhahtdh Jun 03 '15 at 04:40

3 Answers3

5

Here's a variation of the answer by guest271314 that handles more than two consecutive delimiters:

var text = "We.need.to...split.asap";
var re = /(\.*[^.]+)\./;
var items = text.split(re).filter(function(val) { return val.length > 0; });

It uses the detail that if the split expression includes a capture group, the captured items are included in the returned array. These capture groups are actually the only thing we are interested in; the tokens are all empty strings, which we filter out.

EDIT: Unfortunately there's perhaps one slight bug with this. If the text to be split starts with a delimiter, that will be included in the first token. If that's an issue, it can be remedied with:

var re = /(?:^|(\.*[^.]+))\./;
var items = text.split(re).filter(function(val) { return !!val; });

(I think this regex is ugly and would welcome an improvement.)

Community
  • 1
  • 1
Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
3

You can do this without any lookaheads:

var subject = "We.need.to....split.asap";
var regex = /\.?(\.*[^.]+)/g;

var matches, output = [];

while(matches = regex.exec(subject)) {
    output.push(matches[1]);  
}

document.write(JSON.stringify(output));

It seemed like it'd work in one line, as it did on https://regex101.com/r/cO1dP3/1, but had to be expanded in the code above because the /g option by default prevents capturing groups from returning with .match (i.e. the correct data was in the capturing groups, but we couldn't immediately access them without doing the above).

See: JavaScript Regex Global Match Groups

An alternative solution with the original one liner (plus one line) is:

document.write(JSON.stringify(
    "We.need.to....split.asap".match(/\.?(\.*[^.]+)/g)
        .map(function(s) { return s.replace(/^\./, ''); })
));

Take your pick!

Community
  • 1
  • 1
Bilal Akil
  • 4,716
  • 5
  • 32
  • 52
  • This isn't what OP wants, which is to include **all but one** preceding delimiter in each token. (In other words, the result should be `["We","need","to","...split","asap"]`. – Ted Hopp Jun 03 '15 at 02:57
  • I know, there was a problem when moving the regex from regex101.com to here. Should be working now, but no longer 1 line :( – Bilal Akil Jun 03 '15 at 03:01
2

Note: This answer can't handle more than 2 consecutive delimiters, since it was written according to the example in the revision 1 of the question, which was not very clear about such cases.


var text = "We.need.to..split.asap";
// split "." if followed by "."
var res = text.split(/\.(?=\.)/).map(function(val, key) {
  // if `val[0]` does not begin with "." split "."
  // else split "." if not followed by "."
  return val[0] !== "." ? val.split(/\./) : val.split(/\.(?!.*\.)/)
}); 
// concat arrays `res[0]` , `res[1]`
res = res[0].concat(res[1]);

document.write(JSON.stringify(res));
Community
  • 1
  • 1
guest271314
  • 1
  • 15
  • 104
  • 177
  • It's clever, but cannot handle any more than 2 consecutive delimiters. ex: "we.need.to...split.asap". I'll up-vote it, though, since that wasn't specifically clear in the question's example. – DRAB Jun 03 '15 at 02:31
  • 3
    @DRAB Perhaps include _"handle any more than 2 consecutive delimiters. ex: "we.need.to...split.asap"" _"since that wasn't specifically clear in the question's example. "_ at Question ? – guest271314 Jun 03 '15 at 02:32
  • 3
    More than two delimiters was implied by OP's use of the plural: "any recurring `.`s". – Ted Hopp Jun 03 '15 at 02:53
  • 2
    I will remove the downvote if you update your answer to match the description, or add a caveat section about your solution. – nhahtdh Jun 03 '15 at 04:51
  • 1
    @guest271314: I'm aware, so if you don't want to spend more time updating your answer, a caveat or a warning suffices – nhahtdh Jun 03 '15 at 04:55