Pull regex match without its environment

Question

I'd like to match in Python any occurences of something between given expression. For example:

dogdogacowadogdog  <-- search a word between 'a' characters
<span>tiger<a>      <-- search for sth between <span> and <a>

I'd like to match only this something between, so it would cow and tiger respectively. However, when using rexexes:

r'a(.*)a'
r'<span>(.*)<a>'

It prints me the whole line and not only this what I am looking for (what is matched by (.*)). How can I pull this information?

It sounds like you want `(.*?)` (lazy matching). Also I hope you're not parsing HTML with regex... — Alex Hall, Mar 25 '16 at 12:49

Bharel · Accepted Answer · 2016-03-25T13:31:43.857

3

The regex you're looking for is non-greedy matching.

What is non-greedy matching?

.*, .+ and .? attempt to match as many characters as possible. Adding a question mark (?) after these characters attempts to match the least characters possible. .*? will match 0 chars if it can and .+? will match 1.

Back to your question, you should use this as your regex query:

r'a(.*?)a'
r'<span>(.*?)<a>'

Next up is the matching itself:

If you use match =re.search() you need to get match.group(1) and not match.group(0) in order to receive the group itself.

match.group(0) gives out the entire match (including the part before and after the group).

match.group(1) gives out only the first group.

match.groups() however takes out only the groups (not the entire match), so match.groups()[0] will be the first group.

edited Mar 25 '16 at 13:31

answered Mar 25 '16 at 12:51

Bharel

23,672
5
40
80

Could you explain in short why should .group(1) and not .group(0) be used? – Bartłomiej Szałach Mar 25 '16 at 12:55
group(0) is the whole match ie the first capturing group – Whitefret Mar 25 '16 at 12:57
@BartłomiejSzałach Further explained in the answer. The links reference to the correct parts in the `re` documentation for further explanation of that behavior. – Bharel Mar 25 '16 at 12:59

Pull regex match without its environment

1 Answers1