4

Consider this string:

s="""A25-54 plus affinities targeting,Demo (AA F21-54),
A25-49 Artist Affinity Targeting,M21-49 plus,plus plus A 21+ targeting"""

I am looking to fix my pattern which currently does not pull all the age groups in the string (A 21+ is missing from the current output).

Current try:

import re
re.findall(r'(?:A|A |AA F|M)(\d+-\d+)',s)

Output:

['25-54', '21-54', '25-49', '21-49'] #doesnot capture the last group A 21+

Expected Output:

['A25-54','AA F21-54','A25-49','M21-49','A 21+']

As you see, I would like to have the last group too which is A 21+ which is currently missing from my output.

Also if I can get the string associated with the capture group. presently my output apart from not capturing all the groups doesnt have the string before the age group. eg: I want 'A25-54 instead of '25-54' , i guess because of ?: .

Appreciate any help I can get.

anky
  • 74,114
  • 11
  • 41
  • 70

1 Answers1

4

The missing part of the match is due to the fact your pattern contains one capturing group and once there is a capturing group in the regex, the re.findall only returns that parts. The second issue is that you should match either - followed with 1 or more digits or a literal + symbol after the first one or more digits are matched.

You may use

(?:A|A |AA F|M)\d+(?:-\d+|\+)

NOTE: You might want to add a word boundary at the start to only match those A, AA F, etc. as whole words: r'\b(?:A|A |AA F|M)\d+(?:-\d+|\+)'.

See the regex demo and the regex graph:

enter image description here

Details

  • (?:A|A |AA F|M) - a non-capturing group matching A, A , AA , AA F or M
  • \d+ - 1+ digits
  • (?:-\d+|\+) - a non-capturing group matching - and 1+ digits after it or a single + symbol.

Python demo:

import re
s="""A25-54 plus affinities targeting,Demo (AA F21-54),
A25-49 Artist Affinity Targeting,M21-49 plus,plus plus A 21+ targeting"""
print(re.findall(r'(?:A|A |AA F|M)\d+(?:-\d+|\+)',s))
# => ['A25-54', 'AA F21-54', 'A25-49', 'M21-49', 'A 21+']
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    FYI: You may contract the first group further as `(?:A(?: |A F)?|M)\d+(?:-\d+|\+)`. It is less readable but it follows the best practices: each alternative inside a group should not match at the same location as other groups. – Wiktor Stribiżew Jun 12 '19 at 11:34