0

With * and + greedies behaves different in the below regexp patterns, why?

This is my text:

hello abcdef ghijklmc happiness<span>Lorem impsum</span> lorem <p>Lorem impsum</p>Lorem impsum Today is Feb 23rd, 2003

This is regexp:

<[/]?[a-z].*?>

Result:

enter image description here

With this pattern:

<[/]?[a-z].+?>

Result:

enter image description here

uzay95
  • 16,052
  • 31
  • 116
  • 182
  • 1
    Please [for the love of god](http://stackoverflow.com/a/1732454/1348195) don't parse HTML with regular expressions. It will give you a world of pain and the DOM already contains _extremely_ powerful methods to work with HTML directly. – Benjamin Gruenbaum Dec 18 '14 at 11:48
  • 3
    `+?` and `*?` are not greedy quantifiers but the opposite - they are "lazy" or "reluctant". – Kobi Dec 18 '14 at 11:52

1 Answers1

5

Because * is 0 or more and + is 1 or more.

When the tag name only has one character in it:

  1. [a-z] matches the p
  2. . matches the >
    • If you have a + the > has to be matched by the . to it keeps matching until the next > (at the end of the next tag)
    • If you have a *, the > doesn't have to be matched by the . (since you can have 0 matches) so the > matches that character instead.
  3. The > matches the next >
Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335