1

I'm trying to learn some basic Javascript regex. As starters, I read the documentation and this SO question: How do you access the matched groups in a JavaScript regular expression?

I think I've deciphered most of the expression:

/(?:^|\s)format_(.*?)(?:\s|$)/g

Except this part:

(.*?)

I know that

.*

is to match 0 or more occurrences of any character (except newline or line terminator).

But I can't figure out why the

?

is needed.

I was playing with something similar:

/(?:^|\s)ab(.*?)ab(?:\s|$)/
' ab4545ab '

And things have been behaving the same with or without the

?

in

(.*?)

Any thoughts?

Thanks!

Community
  • 1
  • 1
Zhao Li
  • 4,936
  • 8
  • 33
  • 51
  • 1
    The question mark in that context means to do a `lazy` match. – jahroy Jun 12 '12 at 19:17
  • http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/ – Steve Robbins Jun 12 '12 at 19:24
  • do read this - great tutorial, useful examples and all to the point: http://www.regular-expressions.info/tutorial.html, and this in particular http://www.regular-expressions.info/repeat.html#lazy – Joanna Derks Jun 12 '12 at 20:20
  • @Joanna: Thanks for those links. They helped a lot. I was still confused after reading the answers (but of course they all would have pointed me in the right direction), but after reading your link, they all made sense. I wish you had posted this as an answer. – Zhao Li Jun 12 '12 at 23:44

3 Answers3

5

It makes the .* non-greedy. This means that the first occurrence of the next valid character sequence in the regex will halt the .*.

Without the ?, the .* will consume until the last occurrence of the next valid character sequence in the regex.

var s = "foo bar boo bar foo";

var greedy = /.*bar/;
var no_greed = /.*?bar/;

greedy.exec(s); // foo bar boo bar

no_greed.exec(s); // foo bar

So the greedy one consumes past the first "bar" to the last "bar".

The non-greedy only goes to the first "bar".

4

The ? after a .+ or .* match will make the match lazy instead of the default greedy. Meaning, it will match as few characters as possible, in contrast to as many.

Example:

"hello".match(/.+/)    //Returns ["hello"]
"hello".match(/.+?/)   //Returns ["h"]
Madara's Ghost
  • 172,118
  • 50
  • 264
  • 308
2

The ? makes the quantifier ungreedy. Without it, the * will eat up as many characters as possible, which is particularly powerful with .. However, with the ? there, it will eat as few as necessary.

Take this string, for example: "abcccbacba", and match it against /abc(.*)ba/. It will result in capturing ccbac. On the other hand, /abc(.*?)ba/ will capture cc.

Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592