0

I have pattern like this (finding 3 word abbreviations)

s='([A-Z][a-z]+ ){2,4}\([A-Z]{2,4}\)'

and I want to find

line='National Health Service (NHS)'
p=re.findall(s,line)

but p is only ['Service '] and not the whole string. Why?

Dirk N
  • 717
  • 3
  • 9
  • 23

1 Answers1

4

You are not grouping the match correctly, use this instead:

s='(?:[A-Z][a-z]+ ){2,4}\([A-Z]{2,4}\)'

.findall() returns the whole match, unless you define capturing groups ((...)), at which point it'll return the results contained in the group instead. The above pattern uses a non-capturing group instead ((?:...)). Since that leaves your expression without any capturing groups, .findall() returns full matches again.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343