1

I have a DataFrame. To do a statistical conditional test, I split it into two based on a boolean column ('mar'). I want to use the ratio of counts between the two tables to add a column expressing the proportion of true values in the 'mar' column for each combination of the other columns, as seen below.

>>> df_nomar
   alc  cig  mar  cnt
1    1    1    0  538
3    1    0    0  456
5    0    1    0   43
7    0    0    0  279

>>> df_mar
   alc  cig  mar  cnt
0    1    1    1  911
2    1    0    1   44
4    0    1    1    3
6    0    0    1    2
>>> df_mar.loc[:, 'prop'] = np.array(df_mar['cnt'])/(np.array(df_mar['cnt']) + np.array(df_nomar['cnt']))
/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py:296: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[key] = _infer_fill_value(value)
/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py:476: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s

>>> df_mar
   alc  cig  mar  cnt      prop
0    1    1    1  911  0.628709
2    1    0    1   44  0.088000
4    0    1    1    3  0.065217
6    0    0    1    2  0.007117

I've gone to the suggested page to investigate the warning. When I assign the new column, I am using the form df_mar.loc[:, 'prop'] = ..., just as suggested.

So why am I still getting this warning?

Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
kingledion
  • 2,263
  • 3
  • 25
  • 39
  • Related: [chained-assignment](https://stackoverflow.com/questions/tagged/chained-assignment?sort=votes&pageSize=30) tag – Brad Solomon Dec 12 '17 at 16:06
  • 1
    Read this blog for better understanding - https://www.dataquest.io/blog/settingwithcopywarning/ – Tanu Dec 12 '17 at 16:25

1 Answers1

1

It seems you need if same sizes of both DataFrames reset_index for align data:

a = df_mar['cnt'].reset_index(drop=True)
b = df_nomar['cnt'].reset_index(drop=True)
df_mar['prop'] = (a/(a + b)).values

Another solution is convert to numpy array by values:

a = df_mar['cnt'].values
b = df_nomar['cnt'].values
df_mar['prop'] = a / (a + b)

print (df_mar)
   alc  cig  mar  cnt      prop
0    1    1    1  911  0.628709
2    1    0    1   44  0.088000
4    0    1    1    3  0.065217
6    0    0    1    2  0.007117

Where does this pandas warning come from

It obviosly comes from code above. If filter DataFrames then need copy:

df_nomar = df[df['mar'] == 0].copy()
df_mar = df[df['mar'] == 1].copy()

If you modify values in df later you will find that the modifications do not propagate back to the original data (df_nomar and df_mar), and that Pandas does warning.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I converted to an np.array in my code, though not by the same method. The warning still persists using the df[index].values way. – kingledion Dec 12 '17 at 16:28
  • 1
    I try your code and no warning, so I guess problem is in lines of code above. – jezrael Dec 12 '17 at 16:47
  • Maybe you can also check [this](https://stackoverflow.com/q/20625582/2901002) or [moderna pandas](http://tomaugspurger.github.io/modern-1-intro.html) from [tutorials](http://pandas.pydata.org/pandas-docs/stable/tutorials.html#modern-pandas) (look for header `SettingWithCopy`) – jezrael Dec 13 '17 at 11:18