I am trying to automate the transformation of a source file that contains the * character in one of the columns as opposed to a null value (why they did this I have no idea).
I have data like, I want to remove the * and have null's in COL_2
COL_1 | COL_2
XYZ | *
XYZ | 123
Reading data such as
df = pd.read_csv(my_file, names=["COL_1"],["COL_2"], header=0)
doing replace
df['COL_2'] = df.['COL_2'].replace('*', '', regex=True)
I get error like
line XXX, in <module> df['COL_2'] = df.['COL_2'].replace('*', '', regex=True)
line XXX, in replace return super().replace(
Thanks and sorry if this is asked somewhere I had trouble finding it.
edit: The issue was passing Regex argument which I had been doing as a string replace in all my other df.replace lines. The * is a special character and should have been just df.replace as it's an exact match not a string replace, the regex did not work with that special character.