2

According to http://gskinner.com/RegExr/, it provided a regex pattern to search for HTML tag:

A) <[^<]+?> - Simplified example of matching HTML tags

It works, however I changed the regex pattern as below, it also work.

B) <[^<]+> or C) <[^<]+.>

I want to ask what is the different between A), B) and C)?

Thanks

Charles Yeung
  • 38,347
  • 30
  • 90
  • 130

1 Answers1

2

+? is what is known as lazy. Lazy matching will match as few characters as possible. For example, <[^<]+?> given <blah>> will match <blah> even though it could have matched <blah>> because it matches the fewest possible characters it can.

Conversely, + is known as greedy and matches the most possible characters it can. It will match <blah>> since that is the most it can match that satisfies the regex.

. is a character meaning 'match anything', and on its own (no +, *, etc after it) it means 'match one anything'. The . in +. does not modify the +, it is a new element.

As you can see, we use +? because an HTML tag closes on the first > encountered, and +? reflects this by ending as soon as it can close the HTML tag.

Patashu
  • 21,443
  • 3
  • 45
  • 53