+?
is what is known as lazy. Lazy matching will match as few characters as possible. For example, <[^<]+?>
given <blah>>
will match <blah>
even though it could have matched <blah>>
because it matches the fewest possible characters it can.
Conversely, +
is known as greedy and matches the most possible characters it can. It will match <blah>>
since that is the most it can match that satisfies the regex.
.
is a character meaning 'match anything', and on its own (no +, *, etc after it) it means 'match one anything'. The .
in +.
does not modify the +
, it is a new element.
As you can see, we use +?
because an HTML tag closes on the first >
encountered, and +?
reflects this by ending as soon as it can close the HTML tag.