46

Can someone please explain the difference between .+ and .+?

I have the string: "extend cup end table"

  1. The pattern e.+d finds: extend cup end
  2. The pattern e.+?d finds: extend and end

I know that + is one or more and ? is one or zero. But I am not able to understand how does it work.

kapex
  • 28,903
  • 6
  • 107
  • 121
nakul
  • 1,445
  • 7
  • 20
  • 30
  • As mentioned below, it's the difference between greedy and lazy quantifiers. Greedy want to consume as much as possible, lazy as little as possible. The engine will 'build up the string' character by character, from left to right, when the quantifier is lazy. Greedy will do the opposite. It will consume as much as possible and then drop a few characters, from right to left, if it has to. View the following examples: http://regex101.com/r/dG9zZ2 and http://regex101.com/r/tP5xQ3 – Firas Dib Jan 08 '13 at 12:28

2 Answers2

64

Both will match any sequence of one or more characters. The difference is that:

  • .+ is greedy and consumes as many characters as it can.
  • .+? is reluctant and consumes as few characters as it can.

See Differences Among Greedy, Reluctant, and Possessive Quantifiers in the Java tutorial.

Thus:

  • e.+d finds the longest substring that starts with e and ends with d (and contains at least one character in between). In your example extend cup end will be found.
  • e.+?d find the shortest such substring. In your example, extend and end are two such non-overlapping matches, so it finds both.
robsch
  • 9,358
  • 9
  • 63
  • 104
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • Just tested the expressions on rubular.com, and I'm actually a bit baffled why adding ? makes the expression ignore "cup". Your answer doesn't really explain that in detail, though. Would it be possible to add a line or two about it? – Henrik Aasted Sørensen Jan 08 '13 at 11:26
  • @Henrik The result was like this in the question, there was weird formatting in the original question which I failed to edit correctly at first try – kapex Jan 08 '13 at 11:32
18

The regex e.+?d matches an 'e' and then tries to match as few characters as possible (ungreedy or reluctant), followed by a 'd'. That is why the following 2 substrings are matched:

extend cup end table
^^^^^^     ^^^
  1         2

The regex e.+d matches an 'e' and then tries to match as much characters as possible (greedy), followed by a 'd'. What happens is that the first 'e' is found, and then the .+ matches as much as it can (till the end of the line, or input):

extend cup end table
^^^^^^^^^^^^^^^^^^^^

The regex engine comes to the end of the line (or input) and can't match the 'd' in the regex-pattern. So it backtracks to the last 'd' is saw. That is why the single match is found:

extend cup end table
^^^^^^^^^^^^^^<----- backtrack
  1      
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288