Consider the following dataframe:
In [279]: edf = pd.DataFrame({
...: 'person': [1, 1, 1, 2, 2],
...: 'a': [1, 2, np.nan, 5, 10],
...: 'b': [1, np.nan, np.nan, 7, 15]
...: })
In [280]: edf
Out[280]:
person a b
0 1 1.0 1.0
1 1 2.0 NaN
2 1 NaN NaN
3 2 5.0 7.0
4 2 10.0 15.0
I need to create two new columns named c
and d
which values are mapped from columns a
and b
. However, it seems that using .loc
does not work in my case:
In [281]: edf.loc[:, ['c', 'd']] = edf.groupby('person')[['a', 'b']].fillna(method='ffill')
In [282]: edf
Out[282]:
person a b c d
0 1 1.0 1.0 NaN NaN
1 1 2.0 NaN NaN NaN
2 1 NaN NaN NaN NaN
3 2 5.0 7.0 NaN NaN
4 2 10.0 15.0 NaN NaN
It works if I assign the values without .loc
:
In [283]: edf[['c', 'd']] = edf.groupby('person')[['a', 'b']].fillna(method='ffill')
In [284]: edf
Out[284]:
person a b c d
0 1 1.0 1.0 1.0 1.0
1 1 2.0 NaN 2.0 1.0
2 1 NaN NaN 2.0 1.0
3 2 5.0 7.0 5.0 7.0
4 2 10.0 15.0 10.0 15.0
However, I'm receiving this warning from Jupyter Lab if I use assignments shown in In [283]
. I'm wondering which operation would be the most efficient and pythonic in this case? And what are the caveats, particularly in this case, between using and not using .loc
in the assignment?
/xxx/xxx/Library/Python/3.8/lib/python/site-packages/pandas/core/frame.py:3069: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[k1] = value[k2]
Of course we can do this explicitly, but I prefer to have it as a one-liner:
In [286]: edf2['c'] = edf.groupby('person')['a'].fillna(method='ffill')
In [287]: edf2.loc[:, 'd'] = edf.groupby('person')['b'].fillna(method='ffill')
In [288]: edf2
Out[288]:
person a b c d
0 1 1.0 1.0 1.0 1.0
1 1 2.0 NaN 2.0 1.0
2 1 NaN NaN 2.0 1.0
3 2 5.0 7.0 5.0 7.0
4 2 10.0 15.0 10.0 15.0
The complete code as follow:
In [279]: edf = pd.DataFrame({
...: 'person': [1, 1, 1, 2, 2],
...: 'a': [1, 2, np.nan, 5, 10],
...: 'b': [1, np.nan, np.nan, 7, 15]
...: })
In [280]: edf
Out[280]:
person a b
0 1 1.0 1.0
1 1 2.0 NaN
2 1 NaN NaN
3 2 5.0 7.0
4 2 10.0 15.0
In [281]: edf.loc[:, ['c', 'd']] = edf.groupby('person')[['a', 'b']].fillna(method='ffill')
In [282]: edf // Using .loc, it produces NaN instead of values in `a` and `b`
Out[282]:
person a b c d
0 1 1.0 1.0 NaN NaN
1 1 2.0 NaN NaN NaN
2 1 NaN NaN NaN NaN
3 2 5.0 7.0 NaN NaN
4 2 10.0 15.0 NaN NaN
In [283]: edf[['c', 'd']] = edf.groupby('person')[['a', 'b']].fillna(method='ffill') // This assignment works as I'm expecting
In [284]: edf
Out[284]:
person a b c d
0 1 1.0 1.0 1.0 1.0
1 1 2.0 NaN 2.0 1.0
2 1 NaN NaN 2.0 1.0
3 2 5.0 7.0 5.0 7.0
4 2 10.0 15.0 10.0 15.0
In [285]: edf2 = pd.DataFrame({
...: 'person': [1, 1, 1, 2, 2],
...: 'a': [1, 2, np.nan, 5, 10],
...: 'b': [1, np.nan, np.nan, 7, 15]
...: })
In [286]: edf2['c'] = edf.groupby('person')['a'].fillna(method='ffill')
In [287]: edf2['d'] = edf.groupby('person')['b'].fillna(method='ffill')
In [288]: edf2 // Explicit assignments above work as I'm expecting
Out[288]:
person a b c d
0 1 1.0 1.0 1.0 1.0
1 1 2.0 NaN 2.0 1.0
2 1 NaN NaN 2.0 1.0
3 2 5.0 7.0 5.0 7.0
4 2 10.0 15.0 10.0 15.0
In [289]: edf2.loc[:, 'c'] = edf.groupby('person')['a'].fillna(method='ffill')
In [290]: edf2 // This assignment works for assigning one column `c`
Out[290]:
person a b c d
0 1 1.0 1.0 1.0 1.0
1 1 2.0 NaN 2.0 1.0
2 1 NaN NaN 2.0 1.0
3 2 5.0 7.0 5.0 7.0
4 2 10.0 15.0 10.0 15.0
In [291]: edf2['e'] = edf.groupby('person')['b'].fillna(method='ffill')
In [292]: edf2 // This assignment works as intended (without .loc) but it throws SettingCopyWarning
Out[292]:
person a b c d e
0 1 1.0 1.0 1.0 1.0 1.0
1 1 2.0 NaN 2.0 1.0 1.0
2 1 NaN NaN 2.0 1.0 1.0
3 2 5.0 7.0 5.0 7.0 7.0
4 2 10.0 15.0 10.0 15.0 15.0
In [293]: edf2.loc['f'] = edf.groupby('person')['b'].fillna(method='ffill') // Single assignment with .loc works, but with SettingCopyWarning
In [294]: edf2
Out[294]:
person a b c d e
0 1.0 1.0 1.0 1.0 1.0 1.0
1 1.0 2.0 NaN 2.0 1.0 1.0
2 1.0 NaN NaN 2.0 1.0 1.0
3 2.0 5.0 7.0 5.0 7.0 7.0
4 2.0 10.0 15.0 10.0 15.0 15.0
f NaN NaN NaN NaN NaN NaN