1

Possible Duplicate:
Regex question mark

I am trying to figure out how to parse out the text inside the parenthesis but between the single quotes in a statement. For example, if I have the following statement:

(I have a 'cat', 'hat');

I want the result to be

cat
hat

I managed to figure it out by experimenting with the different metacharacters defined in wikipedia (http://en.wikipedia.org/wiki/Regular_expression) however I still have trouble understanding why it works.

I've tried this : \'(.*)\'

My understanding of this regex: I want to get the characters in between the single quotes ' and these characters matches any single character . zero or more times *.

This resulted in:

cat', 'hat

After playing around with a bunch of regex I finally ended up with this by accident: \'(.*?)\'

This resulted in:

cat
hat

Why does this work? (In particular I don't understand how the '?' works.)

Community
  • 1
  • 1
user459811
  • 2,874
  • 10
  • 37
  • 63
  • 1
    http://www.regular-expressions.info/repeat.html , especially "Laziness Instead of Greediness" section – Wiseguy Jun 25 '12 at 21:54

4 Answers4

2

By default, regex is greedy. By using a question mark, you are changing the match to "lazy" mode. This means that it matches the minimum possible number of characters.

Steve H.
  • 6,912
  • 2
  • 30
  • 49
2

The default behavior of the regular expression to make the longest possible match in the string. This is referred to as being "greedy."

You are correct that ? normally just means to match the preceding item (e.g., character) one or more times, but *? is a special case called "lazy star" that switches the regular expression evaluator into a "lazy" mode. In this mode, the evaluator first tries to skip the preceding item (and complete a match without it), before then "going back" for it.

The net result is just what you've observed: it will match shorter strings that satisfy the search criteria instead of (the default) whereby it would just find the longest possible match.

A handy resource for testing out regular expressions is here, and a nice description of the various options, including lazy star, is here.

Myk Willis
  • 12,306
  • 4
  • 45
  • 62
  • 1
    One thing to note is that not all regular-expression dialects support the lazy quantifier; in particular, neither POSIX nor GNU (neither in basic nor in extended form) nor XML schema regular expressions support it, according to http://www.regular-expressions.info/refflavors.html – mklement0 Jun 25 '12 at 22:36
1

This works because,

() is to group your result (this is you want to get as return).

. matchs everthing.

* let the match appear 0 or more times.

? is the optional, so, ab? would case a or ab.

Finally, (.*?) will match everything but just one time (? working) or everything 0 or more times (* working).

GarouDan
  • 3,743
  • 9
  • 49
  • 75
0

When * is followed by ? this makes the * match in lazy mode. By default it works greedy: sucking up as much matching characters as possible. In lazy mode it tries to continue with the rest of the pattern as early as possible.

Got it wrong way round in the first place. Now corrected.