regex match whole line instead of between the tag

Question

I am new to regex and just testing it out, my problem is after looking at examples my regex is matching the whole line almost instead of in between the tag.

re.findall(r'<i>(.*)</i>', 'test <i>abc</i> <i>def</i>')

['abc</i> <i>def']

Why is it not matching just between the tags given me abc def

For testing it out, this is fine. If you really want to parse HTML with regular expression, please see this post: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — Hyperboreus, Dec 08 '13 at 08:21

hwnd · Accepted Answer · 2013-12-08T08:25:56.663

3

You are using .* which is greedy. You want to add ? to the end of that making it non greedy.

>>> re.findall(r'<i>(.*?)</i>', 'test <i>abc</i> <i>def</i>')
['abc', 'def']

From the re documentation:

The *, +, and ? qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE <.> is matched against '<H1>title</H1>', it will match the entire string, and not just ''. Adding ? after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using .? in the previous expression will match only ''.

edited Dec 08 '13 at 08:25

answered Dec 08 '13 at 08:19

hwnd

69,796
4
95
132

@fscore [Regular Expressions Tutorial](http://www.regular-expressions.info/tutorial.html). Good luck. – Steve P. Dec 08 '13 at 08:49

regex match whole line instead of between the tag

1 Answers1