re.search() not matching all expected strings

Question

I have thefollowing function that I have put print statements to test:

def parse_tag_id(id_string):
    if not isinstance(id_string, str):
        id_string = str(id_string)
    if re.search(f'[0-9]{5}', id_string):
        print(f'MATCH: #{id_string}#')    # I put the '#' around each to make sure there are no hidden whitespaces.
    else:
        print(f'NO MATCH: #{id_string}#')
    return None

I am then applying this to a column of a pandas DataFrame and am getting the following results:

MATCH: #73844 / 73845#
MATCH: #73844 / 73845#
MATCH: #83793 / 84758#
MATCH: #73844 / 73845 / 84122 / 84136#
MATCH: #73844 / 73845 / 84136#
NO MATCH: #Not live yet#
NO MATCH: #83046#                         INCORRECT
MATCH: #84120 / 82795#
NO MATCH: #Not live yet#
NO MATCH: #Not live yet#
NO MATCH: #84264#                         INCORRECT
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
NO MATCH: #Not live yet#
NO MATCH: #Not live yet#
MATCH: #73844 / 73845#
NO MATCH: #78787 / 78788#                 INCORRECT
MATCH: #84856#
MATCH: #82795#
MATCH: #84857 / 82795#
MATCH: #82795#
MATCH: #82795#
NO MATCH: #Not live yet#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #84845#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
NO MATCH: #Not live yet#
NO MATCH: #Not live yet#
NO MATCH: #Not live yet#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #83759#
NO MATCH: #Not live yet#
NO MATCH: #Not live yet#
NO MATCH: #84814#                      INCORRECT
MATCH: #84815#
NO MATCH: #Not live yet#
NO MATCH: #nan#
NO MATCH: #84118#                      INCORRECT
NO MATCH: #Not live yet#
NO MATCH: #84640#                      INCORRECT
MATCH: #84591#
NO MATCH: #84660#                      INCORRECT
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #75891 / 75892#

I am expecting all strings with either a signle 5-digit number, or a list of ' / ' separated 5-digit numbers to return true, but I have marked the incorrect ones above with 'INCORRECT'.

Why isn't this working as expected?

`f'[0-9]{5}'` evaluates to `'[0-9]5`. Why do you use an f-string for regexes? You want to use an r-string to avoid issues with double escaping: `r'[0-9]{5}'` — Giacomo Alzetta, Jan 30 '20 at 08:49

score 0 · Accepted Answer · answered Jan 30 '20 at 08:49

0

Because this:

>>> f'[0-9]{5}'
'[0-9]5'
>>> r'[0-9]{5}'
'[0-9]{5}'

f-strings are only meant for formatting. Always use r-strings for regexes to avoid double escaping.

answered Jan 30 '20 at 08:49

Giacomo Alzetta

2,431
6
17

score 0 · Answer 2 · answered Jan 30 '20 at 08:50

0

I just realised I accidentally had the search string in re.search as an f-string instead of a regex string, so it was sarching for all strings containing '[0-9]5'

answered Jan 30 '20 at 08:50

KOB

4,084
9
44
88

re.search() not matching all expected strings

2 Answers2