Replace multiple values to missing value (None) in Pandas

Question

I have a data set d that contains missing values in different forms:

 d = {'col1': [1, 2, '', 'N/A', 'unknown', None], 
      'col2': [3, 4, 'N/A', None, 'N/A_N/A', '']}
d = pd.DataFrame(data=d)

          col1     col2
0        1        3
1        2        4
2               N/A
3      N/A     None
4  unknown  N/A_N/A
5     None

I want to see how many values are actually missing in each column. Therefore I want to convert all empty spaces, n/a and unknowns to be None. I tried this code and got the following result:

d.replace(to_replace =['N/A', '', 'unknown', 'N/A_N/A'],  
                            value = None)

   col1  col2
0     1     3
1     2     4
2     2     4
3     2  None
4     2  None
5  None  None

I don't understand why d.replace did this, anyone have a better solution to my problem? I would like it to be like:

     col1     col2
0        1        3
1        2        4
2      None     None
3      None     None
4      None     None
5      None     None

your code works as expected your desired output is something extra on top of replacing values — gold_cy, Feb 22 '19 at 17:03
You might want to replace the native missing values with `np.NaN` instead (and then, just use `df.isna().sum()` ) — nocibambi, Feb 22 '19 at 19:03

score 5 · Accepted Answer · answered Feb 22 '19 at 17:04

5

This is known behaviour and occurs whenever the target replacement value is None. Status-bydesign, so to speak, as a consequence of how the arguments are handled.

May I suggest to_numeric?

pd.to_numeric(df.stack(), errors='coerce').unstack()

   col1  col2
0   1.0   3.0
1   2.0   4.0
2   NaN   NaN
3   NaN   NaN
4   NaN   NaN
5   NaN   NaN

Alternatively, if you pass a dictionary to replace, your code works.

# df.replace({'': None, 'N/A': None, 'N/A_N/A': None, 'unknown': None})
df.replace(dict.fromkeys(['N/A', '', 'unknown', 'N/A_N/A'], None))

   col1  col2
0   1.0   3.0
1   2.0   4.0
2   NaN   NaN
3   NaN   NaN
4   NaN   NaN
5   NaN   NaN

answered Feb 22 '19 at 17:04

cs95

379,657
97
704
746

2

You beated me in , i was just suppose to answer this dict method :-) +1 – Karn Kumar Feb 22 '19 at 17:05
@pygo Thanks, you get the next one. – cs95 Feb 22 '19 at 17:06
Thank you, it works on the example data. However, my real data contains almost only strings. Is there a way to work with strings? – Ping Feb 22 '19 at 17:07
@coldspeed, if author wants `None` literal the we need to quote the `None` other it will return NaN as shown. – Karn Kumar Feb 22 '19 at 17:10
1

Thank you, dict.fromkeys saved me! – Ping Feb 22 '19 at 17:11

Replace multiple values to missing value (None) in Pandas

1 Answers1