5

I'm having some trouble translating my working C# regular expression into JavaScript's regular expression implementation.

Here's the regular expression:

([a-z]+)((\d+)([a-z]+))?,?

When used on "water2cups,flour4cups,salt2teaspoon" you should get:

[
    ["water", "2cups", "2", "cups"]
    ["flout", "4cups", "4", "cups"]
    ["salt", "2teaspoon", "2", "teaspoon"]
]

... And it does. In C#. But not in JavaScript.

I know there are some minor differences across implementations. What am I missing to get this expression working in JavaScript?

Update

I am using the regex like so:

"water2cups,flour4cups,salt2teaspoon".match(/([a-z]+)((\d+)([a-z]+))?,?/g);
cllpse
  • 21,396
  • 37
  • 131
  • 170
  • Re your update: If you use a `RegExp#exec` loop rather than `String#match`, you get the results you're expecting (see my answer). I'm not enough of a RegExp guru to tell you why. :-) – T.J. Crowder May 05 '10 at 12:11

2 Answers2

12

Creating the RegExp

You haven't shown how you're creating your Javascript regular expression, e.g., are you using a literal:

var rex = /([a-z]+)((\d+)([a-z]+))?,?/;

or a string

var rex = new RegExp("([a-z]+)((\\d+)([a-z]+))?,?");

If the latter, note that I've escaped the backslash.

Global Flag

By default, Javascript regular expressions are not global, that may be an issue for you. Add the g flag if you don't already have it:

var rex = /([a-z]+)((\d+)([a-z]+))?,?/g;

or

var rex = new RegExp("([a-z]+)((\\d+)([a-z]+))?,?", "g");

Using RegExp#exec rather than String#match

Your edit says you're using String#match to get an array of matches. I have to admit I hardly ever use String#match (I use RegExp#exec, as below.) When I use String#match with your regex, I get...very odd results that vary from browser to browser. Using a RegExp#exec loop doesn't do that, so that's what I'd do.

Working Example

This code does what you're looking for:

var rex, str, match, index;

rex = /([a-z]+)((\d+)([a-z]+))?,?/g;
str = "water2cups,flour4cups,salt2teaspoon";

rex.lastIndex = 0; // Workaround for bug/issue in some implementations (they cache literal regexes and don't reset the index for you)
while (match = rex.exec(str)) {
    log("Matched:");
    for (index = 0; index < match.length; ++index) {
        log("&nbsp;&nbsp;match[" + index + "]: |" + match[index] + "|");
    }
}

(The log function just appends text to a div.)

My output for that is:

Matched:
  match[0]: |water2cups,|
  match[1]: |water|
  match[2]: |2cups|
  match[3]: |2|
  match[4]: |cups|
Matched:
  match[0]: |flour4cups,|
  match[1]: |flour|
  match[2]: |4cups|
  match[3]: |4|
  match[4]: |cups|
Matched:
  match[0]: |salt2teaspoon|
  match[1]: |salt|
  match[2]: |2teaspoon|
  match[3]: |2|
  match[4]: |teaspoon|

(Recall that in Javascript, match[0] will be the entire match; then match[1] and so on are your capture groups.)

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
0

C# had the "@" operator which automatically escapes backslashes (). I do not think that Javascript supports it, so you basically need to "escape" the backslash by putting in another one, so this should do the trick

([a-z]+)((\d+)([a-z]+))?,?
npinti
  • 51,780
  • 5
  • 72
  • 96
  • You only have to escape backslashes if you're using a string to create the regex, not if you're using literal notation. – T.J. Crowder May 05 '10 at 11:29