2

I am parsing some text with JavaScript. Let's say I have some string:

"hello wold <1> this is some random text <3> foo <12>"

I need to place the following sub strings in an array:

myArray[0] = "hello world ";
myArray[1] = "<1>";
myArray[2] = " this is some random text ";
myArray[3] = "<3>";
myArray[4] = " foo ";
myArray[5] = "<12>";

Note that I am spliting the string whenever I encounter a <"number"> sequence

I have tried spliting the string with a regular expresion /<\d{1,3}>/ but when I do so I loose the <"number"> sequence. In other words I end up with "hellow world", " this is some random text ", " foo ". Note that I loose the strings "<1>", "<3>" and "<12>" I will like to keep that. How will I be able to solve this?

Tono Nam
  • 34,064
  • 78
  • 298
  • 470
  • possible duplicate of [Javascript - string.split(regex) keep seperators](http://stackoverflow.com/questions/4204210/javascript-string-splitregex-keep-seperators) – outis Feb 19 '12 at 19:34

1 Answers1

12

You need to capture the sequence to retain it.

var str = "hello wold <1> this is some random text <3> foo <12>"

str.split(/(<\d{1,3}>)/);

// ["hello wold ", "<1>", " this is some random text ", "<3>", " foo ", "<12>", ""]

In case there are issues with the capturing group in some browsers, you could do it manually like this:

var str = "hello wold <1> this is some random text <3> foo <12>",    
    re = /<\d{1,3}>/g,
    result = [],
    match,
    last_idx = 0;

while( match = re.exec( str ) ) {
   result.push( str.slice( last_idx, re.lastIndex - match[0].length ), match[0] );

   last_idx = re.lastIndex;
}
result.push( str.slice( last_idx ) );
  • 2
    Note that according to [MDN](https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/String/Split#Description) not all browsers support capturing patterns with `.split()` (though of course it doesn't say which ones don't). – nnnnnn Jan 13 '12 at 00:32
  • @nnnnnn: Interesting, I wonder which ones. To be safe, I updated with a different solution. –  Jan 13 '12 at 00:42