Javascript global regex

Question

I'm trying to get a regular expression working, using javascript. It should match the following format, and I want to find it globaly in the following string:

[[foo=bar:something]]

foo, bar and something can be any alphanumeric characters.

The expression I tried, is the following (the lots of \s* is for matching parts with unintentional whitespaces):

var string = "something [[abc=foo:bar]] some other stuff [[foo=bar:abc]] else";
var regex = new RegExp("\\[\\[\s*.+\s*=\s*.+\s*:\s*.+\s*\\]\\]", "g");
console.log(string.match(regex));

Which gives me:

[  "[[abc=foo:bar]] some other stuff [[foo=bar:abc]]"  ]

As the single match.

My question is, how could i match only this pattern, and not only the stuff between the first and last [[ and ]], and get this result:

[  "[[abc=foo:bar]]", "[[foo=bar:abc]]"  ]

`.+` will consume character greedily. Quick solution would be to change them to `.+?`, but that would not be a rigid solution. — nhahtdh, Jan 31 '14 at 20:14
thanks, that did the trick! Could you explain in a few words, what the ? did at the end? — Balázs Édes, Jan 31 '14 at 20:19
`.+` is going to give you any "non-whitespace" character, which includes a LOT more than alphanumerics. `\w+` would match "word characters", which are alphanumerics and the underscore character (`_`) . . . if the underscore is not acceptable, then you will need to use `[a-zA-Z0-9]+` (or, if you are okay with the regex being case-insensitive, you could use `[a-z0-9]+` and include the `i` flag). — talemyn, Jan 31 '14 at 20:21
It makes the quantifier lazy, so it will try to consume least number of character while matching. — nhahtdh, Jan 31 '14 at 20:21

score 1 · Accepted Answer · edited May 23 '17 at 10:31

As it stands right now, you've got a lot of extraneous matching going on here. All of the \s* can just be removed, since you've already got .+ in there.

This is functionally equivalent to what you currently have, mostly since you're not capturing any of the matches right now:

var string = "something [[abc=foo:bar]] some other stuff [[foo=bar:abc]] else";
var regex = new RegExp("\\[\\[.+=.+:.+\\]\\]", "g");
console.log(string.match(regex));

What your question is asking for is for your regex to be "non-greedy," or "lazy," which means you want it to stop at the first match each time. Take a look at this question or this tutorial for more help on that.

tl;dr is You should change your wildcards to .+?

var string = "something [[abc=foo:bar]] some other stuff [[foo=bar:abc]] else";
var regex = /\[\[.+?=.+?:.+?\]\]/g
console.log(string.match(regex));

score 0 · Answer 2 · edited Feb 01 '14 at 07:09

0

\[\[\s*\w+\s*=\s*\w+\s*:\s*\w+\s*\]\]

That regex should work for what you're doing. The problem was the .+ was matching everything in the rest of the string. You needed to make a character set that would match everything up until the equal sign without consuming it, and \w (which is equivalent to [A-Za-z0-9_]) does just that.

edited Feb 01 '14 at 07:09

nhahtdh

55,989
15
126
162

answered Jan 31 '14 at 20:25

Samuel Reid

1,756
12
22

Works when I tested it at regexpal.com – Samuel Reid Jan 31 '14 at 20:26
thank you Samuel, but with charlieg's answer it was easier to group the words, and extract them – Balázs Édes Jan 31 '14 at 20:35
My result was retaining the greedy algorithm, but whichever one is easier for you will work fine. – Samuel Reid Jan 31 '14 at 20:37
You can't write `[\w^=]` if you want to allow word characters and exclude `=`! The `^` is only seen as special character at the start of a character class, otherwise, it is a simple literal. You can replace it by: `[^\W=]` – Casimir et Hippolyte Jan 31 '14 at 21:03
Hmm. Thanks. Not really warmed up on my regexes. – Samuel Reid Jan 31 '14 at 21:26
@CasimiretHippolyte: `\w` by default does not match `=`, so the whole `[^\W=]` is equivalent to `\w` – nhahtdh Jan 31 '14 at 21:36

score 0 · Answer 3 · answered Feb 01 '14 at 07:18

0

You already got an answer, so short: .* is generally a bad idea. Try to be more specific if you can:

var regex = /\[\[[^=]+=[^:]+:[^\]]+\]\]/g

Asking for "all but the next expectad character" reduces the tracebacks when nothing is found.

answered Feb 01 '14 at 07:18

Jan

1,042
8
22

Javascript global regex

3 Answers3