2

I know perl style regular expresions fairly well, but today I found one that I do not understand:

preg_match('/^On.+?wrote:.+?$/i',$line); //reduced example

What does the .+? mean? I undarstand the .+ alone, I understand .? alone. But .+?? It seems a bug to me.

The line should match popular citation prefixes in the email body and it is much more complicated along with look behinds, but this is the only part i can't understand, and still the regexp seems to work correclty.

SWilk
  • 3,261
  • 8
  • 30
  • 51
  • See http://stackoverflow.com/a/19405163/476 – deceze Apr 01 '14 at 08:51
  • See also [stackoverflow.com/questions/13705478](http://stackoverflow.com/questions/13705478/what-is-the-difference-between-the-regex-and/13705682#13705682) – stema Apr 01 '14 at 09:37

3 Answers3

2

+ means one or more and is greedy. +? means the same, it just is not greedy like usual regex are.

Edit: I wanted to explain it a little further, but the comment of deceze already explains enough.^^

Realitätsverlust
  • 3,941
  • 2
  • 22
  • 46
  • That simple. Face palm. I should have known that... Thank you :-) – SWilk Apr 01 '14 at 08:54
  • @SWilk regex are pretty complicated, i don't think its a reason to facepalm just because you didn't recognize a syntax immediately. I got my regex cheatsheet next to me all time because i always forget something. :) – Realitätsverlust Apr 01 '14 at 08:56
  • Yeah, but I was "our local regexp guru" at the office. And I have used much more advanced and unreadable features. And now I have stumbled upon some basic ungreedeness. Nothing happend, just some ego injured ;) – SWilk Apr 01 '14 at 09:05
2

In short, when you add ? its matching least amount possible, where as without ? its matching most amount possible:

Here is the explanation:

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  .+?                      any character except \n (1 or more times
                           (matching the least amount possible))
--------------------------------------------------------------------------------
  .+                       any character except \n (1 or more times
                           (matching the most amount possible))
Sabuj Hassan
  • 38,281
  • 14
  • 75
  • 85
0

Lazy not Greedy

.+? matches any character (except newline)

Quantifier: Between one and unlimited times, as few times as possible, expanding as needed [lazy]

check at regex101.com

pawel7318
  • 3,383
  • 2
  • 28
  • 44