-1

I use re.match to find the string like this:

print(re.match('''#include(\s)?".*"''', '''#include "my.h"'''))

then I got the result like this:

<_sre.SRE_Match object; span=(0, 15), match='#include "my.h"'>

and then I replace match function:

print(re.findall('''#include(\s)?".*"''', '''#include "my.h"'''))

the result is:

[' ']

I was confused, why dosen't re.findall return the matched string? What's wrong with my regular expression?

Alan Moore
  • 73,866
  • 12
  • 100
  • 156

1 Answers1

1

From help(re.findall):

Return a list of all non-overlapping matches in the string.

If one or more capturing groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

Empty matches are included in the result.

Your parenthesized bit, (\s), is a capturing group, so re.findall returns a list of the captures. There’s only one capturing group, so each item in the list is just a string, rather than a tuple.

You can make the group non-capturing using ?:, i.e. (?:\s)?. That isn’t very useful at that point, though, since it’s equivalent to just \s?. For more flexibility – e.g. if you ever need to capture more than one part – re.finditer is probably the best way to go:

for m in re.finditer(r'#include\s*"(.*?)"', '#include "my.h"'):
    print('Included %s using %s' % (m.group(1), m.group(0)))
Ry-
  • 218,210
  • 55
  • 464
  • 476