0

I am trying to automate the transformation of a source file that contains the * character in one of the columns as opposed to a null value (why they did this I have no idea).

I have data like, I want to remove the * and have null's in COL_2

COL_1 | COL_2

XYZ   | *

XYZ   | 123

Reading data such as

df = pd.read_csv(my_file, names=["COL_1"],["COL_2"], header=0)

doing replace

df['COL_2'] = df.['COL_2'].replace('*', '', regex=True)

I get error like

line XXX, in <module> df['COL_2'] = df.['COL_2'].replace('*', '', regex=True)

line XXX, in replace return super().replace(

Thanks and sorry if this is asked somewhere I had trouble finding it.

edit: The issue was passing Regex argument which I had been doing as a string replace in all my other df.replace lines. The * is a special character and should have been just df.replace as it's an exact match not a string replace, the regex did not work with that special character.

JGH_PC
  • 45
  • 6
  • 1
    you don't need regex here as `*` is a special character, use `df['col'].str.replace('*','')` `regex` is set to False by default in `str.replace` – Umar.H Mar 17 '21 at 19:42
  • That worked! Thank you! – JGH_PC Mar 17 '21 at 19:46
  • Do you want to replace it with '' or np.nan ? You can also give None instead of `''` – Joe Ferndz Mar 17 '21 at 19:49
  • Does this answer your question? [Pandas: replace substring in string](https://stackoverflow.com/questions/38565849/pandas-replace-substring-in-string) – Umar.H Mar 17 '21 at 19:52
  • I believe the issue was passing the regex argument when it wasn't needed. Removing that from the replace seems to have worked. Thank you. – JGH_PC Mar 17 '21 at 20:06

1 Answers1

0
import pandas as pd
d = {'COL_1': ['XYZ', 'XYZ'], 'COL_2': ['*', '123']}
df = pd.DataFrame(data=d)
df
Out[3]: 
  COL_1 COL_2
0   XYZ     *
1   XYZ   123

You just need to replace by the str method.

df['COL_2'] = df['COL_2'].str.replace('*','')
df
Out[8]: 
  COL_1 COL_2
0   XYZ      
1   XYZ   123
itwasthekix
  • 585
  • 6
  • 11