-1

I have thefollowing function that I have put print statements to test:

def parse_tag_id(id_string):
    if not isinstance(id_string, str):
        id_string = str(id_string)
    if re.search(f'[0-9]{5}', id_string):
        print(f'MATCH: #{id_string}#')    # I put the '#' around each to make sure there are no hidden whitespaces.
    else:
        print(f'NO MATCH: #{id_string}#')
    return None

I am then applying this to a column of a pandas DataFrame and am getting the following results:

MATCH: #73844 / 73845#
MATCH: #73844 / 73845#
MATCH: #83793 / 84758#
MATCH: #73844 / 73845 / 84122 / 84136#
MATCH: #73844 / 73845 / 84136#
NO MATCH: #Not live yet#
NO MATCH: #83046#                         INCORRECT
MATCH: #84120 / 82795#
NO MATCH: #Not live yet#
NO MATCH: #Not live yet#
NO MATCH: #84264#                         INCORRECT
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
NO MATCH: #Not live yet#
NO MATCH: #Not live yet#
MATCH: #73844 / 73845#
NO MATCH: #78787 / 78788#                 INCORRECT
MATCH: #84856#
MATCH: #82795#
MATCH: #84857 / 82795#
MATCH: #82795#
MATCH: #82795#
NO MATCH: #Not live yet#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #84845#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
NO MATCH: #Not live yet#
NO MATCH: #Not live yet#
NO MATCH: #Not live yet#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #83759#
NO MATCH: #Not live yet#
NO MATCH: #Not live yet#
NO MATCH: #84814#                      INCORRECT
MATCH: #84815#
NO MATCH: #Not live yet#
NO MATCH: #nan#
NO MATCH: #84118#                      INCORRECT
NO MATCH: #Not live yet#
NO MATCH: #84640#                      INCORRECT
MATCH: #84591#
NO MATCH: #84660#                      INCORRECT
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #75891 / 75892#

I am expecting all strings with either a signle 5-digit number, or a list of ' / ' separated 5-digit numbers to return true, but I have marked the incorrect ones above with 'INCORRECT'.

Why isn't this working as expected?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
KOB
  • 4,084
  • 9
  • 44
  • 88

2 Answers2

0

Because this:

>>> f'[0-9]{5}'
'[0-9]5'
>>> r'[0-9]{5}'
'[0-9]{5}'

f-strings are only meant for formatting. Always use r-strings for regexes to avoid double escaping.

Giacomo Alzetta
  • 2,431
  • 6
  • 17
0

I just realised I accidentally had the search string in re.search as an f-string instead of a regex string, so it was sarching for all strings containing '[0-9]5'

KOB
  • 4,084
  • 9
  • 44
  • 88