-1

I am trying to match two string variables, and would like to catch multiple matches. re.findall seems like the obvious choice for this task, but it doesn't appear to be working the way I would expect it to. The following is an example:

a = 'a(pp)?le'
b = 'ale, apple, apol'
match = re.findall(a,b)
match
['','pp']

However, when I apply the same variables to re.search, it recognizes the embedded regular expression within the string, and picks up the first match:

match = re.search(a,b)
match.group()
'ale'

Can anyone explain why re.findall is not working in this instance? I would expect the following:

match = re.findall(a,b)
match
['ale','apple']

Thanks!

user1185790
  • 623
  • 8
  • 24

2 Answers2

8

You are using a capturing group, wheras you want a non-capturing group:

a = 'a(?:pp)?le'

As stated in the docs (...) in a regex will create a "capturing group" and the result of re.findall will be only what is inside the parens.

If you just want to group things (e.g. for the purpose of applying a ?) use (?:...)which creates a non-capturing group. The result of re.findall in this case will be the whole regex (or the largest capturing group).

The key part of the re.findall docs are: If one or more groups are present in the pattern, return a list of groups this explains the difference in results between re.findall and re.search.

cmh
  • 10,612
  • 5
  • 30
  • 40
1

Let me quote the Python docs about re.findall():

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

And this is what your expression a(pp)?le does. It matches the content in your group, i.e. pp. You can always disable this special behavior of a group by taking a non-capturing group (?:...).

pemistahl
  • 9,304
  • 8
  • 45
  • 75