3

I often want to parse a string with a regular expression and find all the matches plus all the non-matching strings, and all interspersed in their original order, e.g.

var parsed = regexParse(/{([^}]+)}/g, 'Hello {name}, you are {age} years old');

And so parsed will contain:

0 : "Hello "
1 : match containing {name}, name
2 : ", you are "
3 : match containing {age}, age
4 : " years old"

Is there anything in JavaScript (or some widely used library) that resembles this regexParse function? I wrote my own version of it, but it seems so obvious that I'm suspicious that there must already be a "standard" way of doing it:

var regexParse = function(rx, str) {
  var nextPlain = 0, result = [], match;
  rx.lastIndex = 0;
  for (;;) {
    match = rx.exec(str);
    if (!match) {
      result.push(str.substr(nextPlain));
      break;
    }
    result.push(str.substr(nextPlain, match.index - nextPlain));
    nextPlain = rx.lastIndex;
    result.push(match);
  }
  return result;
};

Update

Regarding Dennis's answer, at first I thought it was going to fail to help because all the values in the returned array are strings. How can you tell which items are unmatched text and which are from the matches?

But a bit of experimentation (with IE9 and Chrome anyway) suggests that when split is used in this way, it always alternates the pieces, so that the first is from plain text, the second is a match, the third is plain text, and so on. It follows this rule even if there are two matches with no unmatched text interspersed - it outputs an empty string in such cases.

Even in the trivial case:

'{x}'.split(/{([^}]+)}/g)

The output is strictly:

["", "x", ""]

So you can tell which is which if you know how (and if this assumption holds)!

I like to use the ES5 array methods map, forEach and filter. So with my original regexParse it was a matter of using typeof i == 'string to detect which items were unmatched text.

With split it has to be determined from the position in the returned array, but that's okay because the ES5 array methods pass a second argument, the index, and so we just need to find out if it's odd (a match) or even (unmatched text). So for example, if we have:

var ar = '{greeting} {name}, you are {age} years old'.split(/{([^}]+)}/g);

Now ar contains:

["", "greeting", " ", "name", ", you are ", "age", " years old"]

From that we can get just the matches:

ar.filter(function(s, i) { return i % 2 != 0; });

>>> ["greeting", "name", "age"]

Or just the plain text, stripping out empty strings also:

ar.filter(function(s, i) { return (i % 2 == 0) && s; });

>>> [" ", ", you are ", " years old"]
Community
  • 1
  • 1
Daniel Earwicker
  • 114,894
  • 38
  • 205
  • 284
  • 1
    +1 perhaps not for the question exactly, but for asking a question that resulted in such a fantastic answer. Super excited about that. – Ben Aug 22 '13 at 00:41

1 Answers1

6

I think you're looking for split() with capturing parenthesis:

var myString = "Hello 1 word. Sentence number 2.";
var splits = myString.split(/(\d)/); // Hello ,1, word. Sentence number ,2, .
Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154
  • awesome, I never heard or seen capturing parens with split before, this is really useful – qwertymk Jun 17 '11 at 12:25
  • +1 That's it - perfect. See my updated answer for some further info. – Daniel Earwicker Jun 17 '11 at 17:01
  • Mind==blown, to use the parlance of our times. To use the parlance of other times, neato. Spiffy. Top notch. This is very cool. One of those wish-upvote-twice moments. Thanks! – Ben Aug 22 '13 at 00:40