2

I am looking through some old code bases and have come across two regular expression parts that I think are semantically identical. Wondering it the Stackoverflow community can confirm my understanding.

RegEx 1: (.+?) - One or more characters, but optional

RegEx 2: (.*) - Zero or more characters

I keep thinking of different scenarios but can't think of any input where both expressions wouldn't be the same.

Machavity
  • 30,841
  • 27
  • 92
  • 100
Babak Naffas
  • 12,395
  • 3
  • 34
  • 49

2 Answers2

7

(.+?) means match one or more character, but instead of the default greedy matching (match as much as possible), the ? after the quantifier makes the matching lazy (match as few as possible).

Conceptually, greedy matching will first try the longest possible sequence that can be formed by the pattern inside, then gradually reduce the length of the sequence as the engine backtracks. Lazy matching will first try the shortest possible sequence that can be formed by the pattern inside, then gradually increase the length of the sequence as the engine backtracks.

Therefore, (.+?) and (.*) are completely different. Given a string "abc", the pattern (.+?) will match "a" for the first match, while (.*) will match "abc" for the first match.

When you correct the pattern to the intended meaning: ((?:.+)?), it is exactly the same as (.*) in behavior. Since quantifier is greedy by default, ((?:.+)?) will first try the case of .+ before attempting the case of empty string. And .+ will try the longest sequence before the 1 character sequence. Therefore, the effect of ((?:.+)?) is exactly the same as (.*): it will find the longest sequence and backtrack gradually to the case of empty string.

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
3

First,

. is any character

Next

* is zero or more
+ is one or more 
? is one or zero

You're thinking that .+? is one or more of any character and 0 or 1 of them I'm guessing? You're missing this:

Lazy modifier

*? is zero or more getting as few as possible
+? is one or more getting as few as possible

See here for further discussion Greedy vs. Reluctant vs. Possessive Quantifiers

Community
  • 1
  • 1
JasonG
  • 5,794
  • 4
  • 39
  • 67