I have thefollowing function that I have put print statements to test:
def parse_tag_id(id_string):
if not isinstance(id_string, str):
id_string = str(id_string)
if re.search(f'[0-9]{5}', id_string):
print(f'MATCH: #{id_string}#') # I put the '#' around each to make sure there are no hidden whitespaces.
else:
print(f'NO MATCH: #{id_string}#')
return None
I am then applying this to a column of a pandas DataFrame and am getting the following results:
MATCH: #73844 / 73845#
MATCH: #73844 / 73845#
MATCH: #83793 / 84758#
MATCH: #73844 / 73845 / 84122 / 84136#
MATCH: #73844 / 73845 / 84136#
NO MATCH: #Not live yet#
NO MATCH: #83046# INCORRECT
MATCH: #84120 / 82795#
NO MATCH: #Not live yet#
NO MATCH: #Not live yet#
NO MATCH: #84264# INCORRECT
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
NO MATCH: #Not live yet#
NO MATCH: #Not live yet#
MATCH: #73844 / 73845#
NO MATCH: #78787 / 78788# INCORRECT
MATCH: #84856#
MATCH: #82795#
MATCH: #84857 / 82795#
MATCH: #82795#
MATCH: #82795#
NO MATCH: #Not live yet#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #84845#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
NO MATCH: #Not live yet#
NO MATCH: #Not live yet#
NO MATCH: #Not live yet#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #83759#
NO MATCH: #Not live yet#
NO MATCH: #Not live yet#
NO MATCH: #84814# INCORRECT
MATCH: #84815#
NO MATCH: #Not live yet#
NO MATCH: #nan#
NO MATCH: #84118# INCORRECT
NO MATCH: #Not live yet#
NO MATCH: #84640# INCORRECT
MATCH: #84591#
NO MATCH: #84660# INCORRECT
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #75891 / 75892#
I am expecting all strings with either a signle 5-digit number, or a list of ' / ' separated 5-digit numbers to return true, but I have marked the incorrect ones above with 'INCORRECT'.
Why isn't this working as expected?