3

I'm playing around with pandas and I'm trying to get some NaN columns to be filled in with 0(and leaving others untouched).

Here's what I'm trying:

variablesToCovertToZero = ['column1', 'column2'] #just a list of columns
print('before ', df.isna().sum().sum()) #show me how many nulls
# df = df.update(df[variablesToCovertToZero].fillna(0, inplace=True)) #try 1, didn't work
df[variablesToCovertToZero].fillna(0, inplace=True) #try 2, also didn't work
print('after ', df.isna().sum().sum())

Results when I run it:

before  11056930
/opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py:4259: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  **kwargs
after  11056930

the before and after are the same! But I am also getting a warning. In the past the warning wasn't a problem but I thought I'd add it in just in case it was related.

Any suggestions on what I'm doing wrong? I just want to use the fillin option for specific list of columns.

Lostsoul
  • 25,013
  • 48
  • 144
  • 239
  • Have you seen this [post](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas) ? – Balaji Ambresh Jul 05 '20 at 16:57
  • @BalajiAmbresh I did it but wasn't sure if it was connected or just a warning. is the warning causing the issue with not fillingNA? – Lostsoul Jul 05 '20 at 17:00
  • 1
    @Lostsoul I think the problem is the inplace=True with a subset of the dataframe. if you do `df[variablesToCovertToZero] = df[variablesToCovertToZero].fillna(0)` and not use inplace, it works well. Otherwise if you want to fillna some cols and use inplace, you can do `df.fillna({col:0 for col in variablesToCovertToZero }, inplace=True)` – Ben.T Jul 05 '20 at 17:21
  • 1
    @Ben.T worked like a charm. Can you put that as a answer and I'll accept? – Lostsoul Jul 05 '20 at 17:27

1 Answers1

2

The problem is the inplace=True with a subset of the dataframe when doing df[variablesToCovertToZero], it is what raise the warning and not fill the nan. If you do:

df[variablesToCovertToZero] = df[variablesToCovertToZero].fillna(0)

and not use inplace, it works well. Otherwise if you want to fillna some cols and still use inplace, you can create a dictionary of columns to filled with the value you want.

df.fillna({col:0 for col in variablesToCovertToZero }, inplace=True)
Ben.T
  • 29,160
  • 6
  • 32
  • 54