10

Could someone explain to me the difference between these 3 blocks:

1 -> (.*)
2 -> (.*?)
3 -> .*

As I understand, ? makes the last character optional so why put it ? And why not put the parenthesis at the end?

This comes from here: http://www.tutorialspoint.com/python/python_reg_expressions.htm

1st example : searchObj = re.search( r'(.*) are (.*?) .*', line, re.M|re.I)
Russia Must Remove Putin
  • 374,368
  • 89
  • 403
  • 331
John Doe
  • 1,570
  • 3
  • 13
  • 22

1 Answers1

12

.* will match any character (including newlines if dotall is used). This is greedy: it matches as much as it can.

(.*) will add that to a capture group.

(.*?) the ? makes the .* non-greedy, matching as little as it can to make a match, and the parenthesis makes it a capture group as well.

For example:

>>> import re
>>> txt = ''' foo
... bar
... baz '''
>>> for found in re.finditer('(.*)', txt):
...     print found.groups()
... 
(' foo',)
('',)
('bar',)
('',)
('baz ',)
('',)
>>> for found in re.finditer('.*', txt):
...     print found.groups()
... 
()
()
()
()
()
()
>>> for found in re.finditer('.*', txt, re.DOTALL):
...     print found.groups()
... 
()
()
>>> for found in re.finditer('(.*)', txt, re.DOTALL):
...     print found.groups()
... 
(' foo\nbar\nbaz ',)
('',)

And since the ? matches as little as possible, we match empty strings:

>>> for found in re.finditer('(.*?)', txt, re.DOTALL):
...     print found.groups()
... 
('',)
('',)
('',)
('',)
('',)
('',)
('',)
('',)
('',)
('',)
('',)
('',)
('',)
('',)
Russia Must Remove Putin
  • 374,368
  • 89
  • 403
  • 331
  • 4
    An example: matching `(.*)h` against `"this thing"` returns `"this t"` (the longest possible string ending in h); matching `(.*?)h` would return `"t"` instead (the shortest possible string ending in h). – Hugh Bothwell Jan 10 '15 at 21:33
  • Well thank you all, very interesting links too. As I understand now that english natives define "greedy" as the one who takes a lot when the french natives define "greedy" as the one who gives back/returns the minimum. ;) – John Doe Jan 10 '15 at 21:45