9

What is the exact definition of group(0) in re.search?

Sometimes the search can get complex and I would like to know what is the supposed group(0) value by definition?

Just to give an example of where the confusion comes, consider this matching. The printed result is only def. So in this case group(0) didn't return the entire match.

 m = re.search('(?<=abc)def', 'abcdef')
>>> m.group(0)
def
apadana
  • 13,456
  • 15
  • 82
  • 98

3 Answers3

18

match_object.group(0) says that the whole part of match_object is chosen.

In addition group(0) can be be explained by comparing it with group(1), group(2), group(3), ..., group(n). Group(0) locates the whole match expression. Then to determine more matching locations paranthesis are used: group(1) means the first paranthesis pair locates matching expression 1, group(2) says the second next paranthesis pair locates the match expression 2, and so on. In each case the opening bracket determines the next paranthesis pair by using the furthest closing bracket to form a paranthesis pair. This probably sounds confusing, that's why there is an example below.

But you need to differentiate between the syntax of the paranthesis of '(?<=abc)'. These paranthesis have a different syntactical meaning, which is to locate what is bound by '?<='. So your main problem is that you don't know what '?<=' does. This is a so called look-behind which means that it matches the part behind the expression that it bounds.

In the following example 'abc' is bound by the look-behind.

No paranthesis are needed to form match group 0 since it locates the whole match object anyway.

The opening bracket in front of the letter 'd' takes the last closing bracket in front of the letter 'f' to form matching group 1.

The brackets that are around the letter 'e' define matching group 2.

import re

m = re.search('(?<=abc)(d(e))f', 'abcdef')

print(m.group(0))
print(m.group(1))
print(m.group(2))

This prints:

def
de
e
ah bon
  • 9,293
  • 12
  • 65
  • 148
manuel_va
  • 934
  • 8
  • 11
9

group(0) returns the full string matched by the regex. It's just that abc isn't part of the match. (?<=abc) doesn't match abc - it matches any position in the string immediately preceded by abc.

user2357112
  • 260,549
  • 28
  • 431
  • 505
0

supplementary:

run this:

import re    
m = re.search('text', 'my text')
help(m.group)

print(m.group(0) == m.group())

# when in doubt, dir(m) helps too

output:

Help on built-in function group:

group(...) method of re.Match instance
    group([group1, ...]) -> str or tuple.
    Return subgroup(s) of the match by indices or names.
    For 0 returns the entire match.

True
eliu
  • 2,390
  • 1
  • 17
  • 29