Python regex re.findall failing

Question

Could someone help me to find out what is going on here? I wanted to get any number followed by "m" or "y"

Why is that re.search() works correctly, meanwhile re.findall() fails while searching on the string?

a = ['COP' , '\t\t\t', 'Basis', 'Notl', 'dv01', '6m', '9m', '1y',
     '18m', '2y', '3y', "15.6", 'mm', '4.6', '4y', '5y', '10', 'mm',
     '4.6', '6y', '7y', '8y', '9y', '10y', '20y', 'TOTAL', '\t\t9.2']


rule = re.compile(r"^\d+[ym]$")
COP = [re.search(rule, entry)[0] for entry in a if (re.search(rule, entry))]
print(COP)
# OUTPUT >> ['6m', '9m', '1y', '18m', '2y', '3y', '4y', '5y', '6y', '7y', '8y', '9y', '10y', '20y']

However

rule1 = re.compile(r"\d+[ym]$")
a_str = " ".join(a) 
COP1 = re.findall(rule1, a_str)
print(COP1)
# OUTPUT >> []

I tried multiple options to no avail.

it is because of the `$` anchor in your `re.findall(rule1, " ".join(a))` which indicate that it should end by that char, remove that and it will match them. — zamir, Jan 30 '20 at 19:53

Jean-François Fabre · Accepted Answer · 2020-01-30T19:56:08.373

3

You're using a regex with start & end instructions: ^\d+[ym]$ (well, okay the ^ has been removed but it's the same problem)

It works with single strings, but as soon as you join the strings to create a kind of sentence, your expression doesn't match anymore.

Get rid of start & end and use word boundary instead: r"\b\d+[ym]\b" (note the raw string prefix which is necessary with \b)

import re
r = re.compile(r"\b\d+[ym]\b")
>>> r.findall("6y 67m")
['6y', '67m']

(not using word boundary would match strings that you don't want matched for instance xx56yz)

edited Jan 30 '20 at 19:56

answered Jan 30 '20 at 19:53

Jean-François Fabre

137,073
23
153
219

Fabre, You are right, thank you so much, I removed the start but not the end – MasterOfTheHouse Jan 30 '20 at 19:55
1

true. But removing the start can match things you don't want either. Ex: `eee6y` – Jean-François Fabre Jan 30 '20 at 19:57

Python regex re.findall failing

1 Answers1