-1

I'd like to match in Python any occurences of something between given expression. For example:

dogdogacowadogdog  <-- search a word between 'a' characters
<span>tiger<a>      <-- search for sth between <span> and <a>

I'd like to match only this something between, so it would cow and tiger respectively. However, when using rexexes:

r'a(.*)a'
r'<span>(.*)<a>'

It prints me the whole line and not only this what I am looking for (what is matched by (.*)). How can I pull this information?

Bartłomiej Szałach
  • 2,393
  • 3
  • 30
  • 50

1 Answers1

3

The regex you're looking for is non-greedy matching.

What is non-greedy matching?

.*, .+ and .? attempt to match as many characters as possible. Adding a question mark (?) after these characters attempts to match the least characters possible. .*? will match 0 chars if it can and .+? will match 1.

Back to your question, you should use this as your regex query:

r'a(.*?)a'
r'<span>(.*?)<a>'

Next up is the matching itself:

If you use match =re.search() you need to get match.group(1) and not match.group(0) in order to receive the group itself.

match.group(0) gives out the entire match (including the part before and after the group).

match.group(1) gives out only the first group.

match.groups() however takes out only the groups (not the entire match), so match.groups()[0] will be the first group.

Bharel
  • 23,672
  • 5
  • 40
  • 80