0

I receive a vectors of scores from our system that generally has numeric values (albeit stored as character 00XXX) or No Hit statements. Sometimes we have processing errors and receive both letters and numbers in the string (ex: 00F69) which cause errors in a later process. I would like to replace them by blanks which are also a valid entry. I assume RE would be the right way to make it work. So far I haven't gotten the structure right though.

This is the first example of how I fixed 2 problematic errors in a subset. In our batch data other patterns can come up so I want something more robust.

import pandas as pd
df = pd.DataFrame(data = {'Score': ['00599', 'NO HIT', '00800', '00B66', '00750', '0010E', '00900', '']})
df["Score"] = df["Score"].replace("00B66", "") 
df["Score"] = df["Score"].replace("0010E", "") 

df

attempt with RE below doesn't seem to work as column does not change

import re
df = pd.DataFrame(data = {'Score': ['00599', 'NO HIT', '00800', '00B66', '00750', '0010E', '00900', '']})
regex = '^(?=.*[0-9]$)(?=.*[a-zA-Z])'
df['Score2']= [re.sub(regex, '', str(x)) for x in df['Score']]
df
DarknessFalls
  • 111
  • 2
  • 5
  • 12

0 Answers0