0

What I wanted to do is to replace a single entry NaN value of pandas DataFrame to a single string. Here is what I did.

s = pd.DataFrame({'A':['S12','S1','E53',np.NaN], 'B':[1,2,3,4]})

s['A'][s['A'].isnull()==True] = 'P'

This code will try to find 'NaN' value in the DataFrame and replace it to the string 'P' and result looks like this.

     A  B
0  S12  1
1   S1  2
2  E53  3
3    P  4

But I also get a warning like this:

/Users/grr/anaconda/bin/ipython:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # -*- coding: utf-8 -*-

Could anyone explain to me what this means and what I should do to avoid this?

Thank you!

Grr
  • 15,553
  • 7
  • 65
  • 85
  • Possible duplicate of [How to deal with SettingWithCopyWarning in Pandas?](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas) – Bubble Bubble Bubble Gut Dec 22 '17 at 22:28
  • There are many things you can improve here. But for starters... there is a .fillna() function already. – Anton vBR Dec 22 '17 at 22:30

2 Answers2

0

The warning you get is because you are attempting to assign values in an non-recommended way. See: https://stackoverflow.com/a/20627316/7386332 for more info.

Instead you should do this:

import pandas as pd
import numpy as np

s = pd.DataFrame({'A':['S12','S1','E53',np.NaN], 'B':[1,2,3,4]})

s.A.fillna('P', inplace=True) # alternative: s.A.replace(np.NaN, 'P', inplace=True)

To access the values in the right way you should use loc. Something in the lines of:

s.loc[:,'A'] = s.loc[:,'A'].replace(np.NaN,'P')
Anton vBR
  • 18,287
  • 5
  • 40
  • 46
  • Doesn’t make sense to use loc here at all really – DJK Dec 22 '17 at 22:52
  • @djk47463 Yeah.. in this case it does not make any sense. And that's why I propose to use a replace or fillna. But loc is the correct way to access and change values in dfs. – Anton vBR Dec 22 '17 at 22:54
0

You should use loc when setting values.

Essentially, there is no guarantee that the __setitem__ call will be executed on the original dataframe or a copy in memory. You should really read the section mentioned in the error method(indexing-view-versus-copy). The preferred method would be:

s.loc[s.A.isnull(), 'A'] = 'P'
Grr
  • 15,553
  • 7
  • 65
  • 85