4

What does this combination of quantifiers *? mean?

Use this as the following example:

([0-9][AB]*?)
Maroun
  • 94,125
  • 30
  • 188
  • 241
akaii
  • 103
  • 7
  • Does this answer your question? [Reference - What does this regex mean?](/q/22937618/r90527) – outis Oct 04 '22 at 18:13

2 Answers2

5

It's a non-greedy match. In [AB]*?, the regex looks for as few occurrences of [AB] as needed to make the overall regex match the searched string, whereas the greedy version [AB]* looks for as many occurrences as possible. It is a feature of Perl's regexes, and hence available in PCRE (Perl Compatible Regular Expressions) (see repetition) and other systems that look to Perl for their definition.

The PCRE page gives an example:

The classic example of where [greediness] gives problems is in trying to match comments in C programs. These appear between /* and */ and within the comment, individual * and / characters may appear. An attempt to match C comments by applying the pattern:

/\*.*\*/

to the string

/* first comment */  not comment  /* second comment */

fails, because it matches the entire string owing to the greediness of the .* item.

If a quantifier is followed by a question mark, it ceases to be greedy, and instead matches the minimum number of times possible, so the pattern

/\*.*?\*/

does the right thing with the C comments.

Community
  • 1
  • 1
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
2

Jonathan already explained the difference, but here's an example that might help you understand what's happening here.

Given the string "9AB":

  • ([0-9][AB]*?) matches only "9A" because it stop as soon as "A" matched (lazy)

  • ([0-9][AB]*) matches the whole string ("9AB") because it consumes "A" and successes to match the following "B" (greedy)

Note that the second one will match a digit, followed by zero or more (unlimited) number of "A" or "B"s.

Maroun
  • 94,125
  • 30
  • 188
  • 241
  • Thanks for the example, Maroun. I tried this example in python and instead of getting what you proposed, I'm getting a result of "9". This was my code: x = re.search(r'[0-9][AB]*?', '9AB') print x.group(0) – akaii Apr 09 '16 at 21:39
  • Which one would be the correct result, "9" or "9A"? – akaii Apr 09 '16 at 22:07
  • 2
    9 is correct because zero matches of `[AB]` is allowed. Greediness mainly matters when there's something after the greedy quantifier. – Jonathan Leffler Apr 09 '16 at 22:08