0

Consider the following dataframe:

In [279]: edf = pd.DataFrame({
     ...:     'person': [1, 1, 1, 2, 2],
     ...:     'a': [1, 2, np.nan, 5, 10],
     ...:     'b': [1, np.nan, np.nan, 7, 15]
     ...: })

In [280]: edf
Out[280]: 
   person     a     b
0       1   1.0   1.0
1       1   2.0   NaN
2       1   NaN   NaN
3       2   5.0   7.0
4       2  10.0  15.0

I need to create two new columns named c and d which values are mapped from columns a and b. However, it seems that using .loc does not work in my case:

In [281]: edf.loc[:, ['c', 'd']] = edf.groupby('person')[['a', 'b']].fillna(method='ffill')

In [282]: edf
Out[282]: 
   person     a     b   c   d
0       1   1.0   1.0 NaN NaN
1       1   2.0   NaN NaN NaN
2       1   NaN   NaN NaN NaN
3       2   5.0   7.0 NaN NaN
4       2  10.0  15.0 NaN NaN

It works if I assign the values without .loc:

In [283]: edf[['c', 'd']] = edf.groupby('person')[['a', 'b']].fillna(method='ffill')

In [284]: edf
Out[284]: 
   person     a     b     c     d
0       1   1.0   1.0   1.0   1.0
1       1   2.0   NaN   2.0   1.0
2       1   NaN   NaN   2.0   1.0
3       2   5.0   7.0   5.0   7.0
4       2  10.0  15.0  10.0  15.0

However, I'm receiving this warning from Jupyter Lab if I use assignments shown in In [283]. I'm wondering which operation would be the most efficient and pythonic in this case? And what are the caveats, particularly in this case, between using and not using .loc in the assignment?

/xxx/xxx/Library/Python/3.8/lib/python/site-packages/pandas/core/frame.py:3069: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[k1] = value[k2]

Of course we can do this explicitly, but I prefer to have it as a one-liner:

In [286]: edf2['c'] = edf.groupby('person')['a'].fillna(method='ffill')

In [287]: edf2.loc[:, 'd'] = edf.groupby('person')['b'].fillna(method='ffill')

In [288]: edf2
Out[288]: 
   person     a     b     c     d
0       1   1.0   1.0   1.0   1.0
1       1   2.0   NaN   2.0   1.0
2       1   NaN   NaN   2.0   1.0
3       2   5.0   7.0   5.0   7.0
4       2  10.0  15.0  10.0  15.0

The complete code as follow:

In [279]: edf = pd.DataFrame({
     ...:     'person': [1, 1, 1, 2, 2],
     ...:     'a': [1, 2, np.nan, 5, 10],
     ...:     'b': [1, np.nan, np.nan, 7, 15]
     ...: })

In [280]: edf
Out[280]: 
   person     a     b
0       1   1.0   1.0
1       1   2.0   NaN
2       1   NaN   NaN
3       2   5.0   7.0
4       2  10.0  15.0

In [281]: edf.loc[:, ['c', 'd']] = edf.groupby('person')[['a', 'b']].fillna(method='ffill')

In [282]: edf // Using .loc, it produces NaN instead of values in `a` and `b`
Out[282]: 
   person     a     b   c   d
0       1   1.0   1.0 NaN NaN
1       1   2.0   NaN NaN NaN
2       1   NaN   NaN NaN NaN
3       2   5.0   7.0 NaN NaN
4       2  10.0  15.0 NaN NaN

In [283]: edf[['c', 'd']] = edf.groupby('person')[['a', 'b']].fillna(method='ffill')  // This assignment works as I'm expecting

In [284]: edf
Out[284]: 
   person     a     b     c     d
0       1   1.0   1.0   1.0   1.0
1       1   2.0   NaN   2.0   1.0
2       1   NaN   NaN   2.0   1.0
3       2   5.0   7.0   5.0   7.0
4       2  10.0  15.0  10.0  15.0

In [285]: edf2 = pd.DataFrame({
     ...:     'person': [1, 1, 1, 2, 2],
     ...:     'a': [1, 2, np.nan, 5, 10],
     ...:     'b': [1, np.nan, np.nan, 7, 15]
     ...: })

In [286]: edf2['c'] = edf.groupby('person')['a'].fillna(method='ffill')

In [287]: edf2['d'] = edf.groupby('person')['b'].fillna(method='ffill')

In [288]: edf2   // Explicit assignments above work as I'm expecting
Out[288]: 
   person     a     b     c     d
0       1   1.0   1.0   1.0   1.0
1       1   2.0   NaN   2.0   1.0
2       1   NaN   NaN   2.0   1.0
3       2   5.0   7.0   5.0   7.0
4       2  10.0  15.0  10.0  15.0

In [289]: edf2.loc[:, 'c'] = edf.groupby('person')['a'].fillna(method='ffill')

In [290]: edf2  // This assignment works for assigning one column `c`
Out[290]: 
   person     a     b     c     d
0       1   1.0   1.0   1.0   1.0
1       1   2.0   NaN   2.0   1.0
2       1   NaN   NaN   2.0   1.0
3       2   5.0   7.0   5.0   7.0
4       2  10.0  15.0  10.0  15.0

In [291]: edf2['e'] = edf.groupby('person')['b'].fillna(method='ffill')

In [292]: edf2  // This assignment works as intended (without .loc) but it throws SettingCopyWarning
Out[292]: 
   person     a     b     c     d     e
0       1   1.0   1.0   1.0   1.0   1.0
1       1   2.0   NaN   2.0   1.0   1.0
2       1   NaN   NaN   2.0   1.0   1.0
3       2   5.0   7.0   5.0   7.0   7.0
4       2  10.0  15.0  10.0  15.0  15.0

In [293]: edf2.loc['f'] = edf.groupby('person')['b'].fillna(method='ffill')  // Single assignment with .loc works, but with SettingCopyWarning

In [294]: edf2
Out[294]: 
   person     a     b     c     d     e
0     1.0   1.0   1.0   1.0   1.0   1.0
1     1.0   2.0   NaN   2.0   1.0   1.0
2     1.0   NaN   NaN   2.0   1.0   1.0
3     2.0   5.0   7.0   5.0   7.0   7.0
4     2.0  10.0  15.0  10.0  15.0  15.0
f     NaN   NaN   NaN   NaN   NaN   NaN
Toto Lele
  • 394
  • 2
  • 13
  • `SettingWithCopyWarning: ` is only in real data, right? In sample no error? – jezrael Jan 27 '21 at 07:25
  • `edf` is created by some filtration? – jezrael Jan 27 '21 at 07:27
  • `edf` is created for illustration purpose. Yes, I always use `.loc` to assign values, but in a multiple columns scenario, it doesn't seem to work as I am expecting (see `In [281]` where it assigns `NaN` values instead of the intended values) – Toto Lele Jan 27 '21 at 07:36
  • for me `edf[['c', 'd']] = edf.groupby('person')[['a', 'b']].fillna(method='ffill')` working with no error. – jezrael Jan 27 '21 at 07:37
  • So I think `SettingWithCopyWarning` cause another problem, it is unfortunately error, where is problem in some lines of code before error lines. So is necessaty use `copy` for filtration for avoid error. – jezrael Jan 27 '21 at 07:38
  • Yes, mine also works without `.loc` but it throws warning `SettingWithCopyWarning`. I tried a simple assignments as follow: `edf2.loc[:, ['c', 'd']] = edf2.loc[:, ['a', 'b']]` and it assigns `NaN` to columns `c` and `d` instead of the values from `a` and `b`. However, without using `.loc` in this simple assignment, it assigns the values as intended. I'm wondering whether I should or should not use `.loc` in this case – Toto Lele Jan 27 '21 at 07:39
  • hmm, I think `loc` is not solution for avoid `SettingWithCopyWarning`, so if not possible find problem is possible post complete your code? I can find problem. – jezrael Jan 27 '21 at 07:40
  • I says about [this](https://stackoverflow.com/questions/55204404/settingwithcopywarning-after-using-copy/55204409#55204409) or [this](https://stackoverflow.com/questions/45035929/creating-new-pandas-dataframe-from-certain-columns-of-existing-dataframe/45035966#45035966) – jezrael Jan 27 '21 at 07:43
  • Btw, maybe warning is because assign from one Dataframe to another `edf` to `edf2` e.g. here `edf2['e'] = edf.groupby('person')['b'].fillna(method='ffill')` – jezrael Jan 27 '21 at 08:47

0 Answers0