.*
will match any character (including newlines if dotall is used). This is greedy: it matches as much as it can.
(.*)
will add that to a capture group.
(.*?)
the ?
makes the .*
non-greedy, matching as little as it can to make a match, and the parenthesis makes it a capture group as well.
For example:
>>> import re
>>> txt = ''' foo
... bar
... baz '''
>>> for found in re.finditer('(.*)', txt):
... print found.groups()
...
(' foo',)
('',)
('bar',)
('',)
('baz ',)
('',)
>>> for found in re.finditer('.*', txt):
... print found.groups()
...
()
()
()
()
()
()
>>> for found in re.finditer('.*', txt, re.DOTALL):
... print found.groups()
...
()
()
>>> for found in re.finditer('(.*)', txt, re.DOTALL):
... print found.groups()
...
(' foo\nbar\nbaz ',)
('',)
And since the ?
matches as little as possible, we match empty strings:
>>> for found in re.finditer('(.*?)', txt, re.DOTALL):
... print found.groups()
...
('',)
('',)
('',)
('',)
('',)
('',)
('',)
('',)
('',)
('',)
('',)
('',)
('',)
('',)