1

I'm trying to get a regular expression working, using javascript. It should match the following format, and I want to find it globaly in the following string:

[[foo=bar:something]] 

foo, bar and something can be any alphanumeric characters.

The expression I tried, is the following (the lots of \s* is for matching parts with unintentional whitespaces):

var string = "something [[abc=foo:bar]] some other stuff [[foo=bar:abc]] else";
var regex = new RegExp("\\[\\[\s*.+\s*=\s*.+\s*:\s*.+\s*\\]\\]", "g");
console.log(string.match(regex));

Which gives me:

[  "[[abc=foo:bar]] some other stuff [[foo=bar:abc]]"  ]

As the single match.

My question is, how could i match only this pattern, and not only the stuff between the first and last [[ and ]], and get this result:

[  "[[abc=foo:bar]]", "[[foo=bar:abc]]"  ]
Progo
  • 3,452
  • 5
  • 27
  • 44
Balázs Édes
  • 13,452
  • 6
  • 54
  • 89
  • 1
    `.+` will consume character greedily. Quick solution would be to change them to `.+?`, but that would not be a rigid solution. – nhahtdh Jan 31 '14 at 20:14
  • thanks, that did the trick! Could you explain in a few words, what the ? did at the end? – Balázs Édes Jan 31 '14 at 20:19
  • 1
    `.+` is going to give you any "non-whitespace" character, which includes a LOT more than alphanumerics. `\w+` would match "word characters", which are alphanumerics and the underscore character (`_`) . . . if the underscore is not acceptable, then you will need to use `[a-zA-Z0-9]+` (or, if you are okay with the regex being case-insensitive, you could use `[a-z0-9]+` and include the `i` flag). – talemyn Jan 31 '14 at 20:21
  • 1
    It makes the quantifier lazy, so it will try to consume least number of character while matching. – nhahtdh Jan 31 '14 at 20:21

3 Answers3

1

As it stands right now, you've got a lot of extraneous matching going on here. All of the \s* can just be removed, since you've already got .+ in there.

This is functionally equivalent to what you currently have, mostly since you're not capturing any of the matches right now:

var string = "something [[abc=foo:bar]] some other stuff [[foo=bar:abc]] else";
var regex = new RegExp("\\[\\[.+=.+:.+\\]\\]", "g");
console.log(string.match(regex));

What your question is asking for is for your regex to be "non-greedy," or "lazy," which means you want it to stop at the first match each time. Take a look at this question or this tutorial for more help on that.

tl;dr is You should change your wildcards to .+?

var string = "something [[abc=foo:bar]] some other stuff [[foo=bar:abc]] else";
var regex = /\[\[.+?=.+?:.+?\]\]/g
console.log(string.match(regex));
Community
  • 1
  • 1
Charlie G
  • 814
  • 9
  • 22
0
\[\[\s*\w+\s*=\s*\w+\s*:\s*\w+\s*\]\]

That regex should work for what you're doing. The problem was the .+ was matching everything in the rest of the string. You needed to make a character set that would match everything up until the equal sign without consuming it, and \w (which is equivalent to [A-Za-z0-9_]) does just that.

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
Samuel Reid
  • 1,756
  • 12
  • 22
0

You already got an answer, so short: .* is generally a bad idea. Try to be more specific if you can:

var regex = /\[\[[^=]+=[^:]+:[^\]]+\]\]/g

Asking for "all but the next expectad character" reduces the tracebacks when nothing is found.

Jan
  • 1,042
  • 8
  • 22