I want to use re.match() to clean a pandas data frame such that if an entry in any column is 1 or 2 it remains unchanged, but if it is any other value is is set to NaN.
The problem's that my function sets everything to NaN. I'm new to regular expressions so I think I've made a mistake.
Thanks!
# DATA
data = [['Bob',10,1],['Bob',2,2],['Clarke',13,1]]
my_df = pd.DataFrame(data,columns=['Name','Age','Sex'])
print(my_df)
Name Age Sex
0 Bob 10 1
1 Bob 2 2
2 Clarke 13 1
# CLEANING FUNCTION
def my_fun(df):
for col in df.columns:
for row in df.index:
if re.match('^\d{1}(\.)\d{2}$', str(df[col][row])):
df[col][row] = df[col][row]
else:
df[col][row] = np.nan
return(df)
# OUTPUT
my_fun(my_df)
Name Age Sex
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
# EXPECTED/DESIRED OUTPUT
Name Age Sex
0 NaN NaN 1
1 NaN 2 2
2 NaN NaN 1