Pandas apply function to subest of a column in order to create a new column

Question

I have defined a panda dataframe :

df = pd.DataFrame( {
   'A': [1,1,1,1,2,2,2,3,3],
   'B': [5,4,8,6,5,6,6,7,7])
df
    A  B
0   1  5
1   1  4
2   1  8
3   1  6
4   2  5
5   2  6
6   2  6
7   3  7
8   3  7

I want to create a new column C that will compute for each value df.A=i the filter scipy.signal.savgol_filter of the corresponding elements of B i.e the filter of df.loc[df.A==i].B for i=1,2,3...

I use the following code :

for i in df.A.unique() : 
    df.loc[df.A==i]['C']=scipy.signal.savgol_filter(df.loc[df.A==i].B, 3, 1)

which doesn't create the column 'C' and gives me the message :

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I've gone through the documentation but don't find the proper way of defining that new column. What method should I use to do so ?

Thank you for your help.

Clarification :

This issue is NOT about the scipy.signal.savgol_filter function and would still be the same with any other function that uses N elements of df.B to create N other elements to put in df.C such as performing the fft of df.loc[df.A==i].B for i=1,2,3...

Don't use chained indexing. Have you tried `df.loc[df.A==i, 'C']=scipy.signal.savgol_filter(df.loc[df.A==i, 'B'], 3, 1)`? — jpp, Jul 05 '18 at 13:08

jezrael · Accepted Answer · 2018-07-05T13:14:25.910

It is called chaining indexing, better is:

for i in df.A.unique() : 
    df.loc[df.A==i, 'C']=scipy.signal.savgol_filter(df.loc[df.A==i, 'B'], 3, 1)

But here is best use GroupBy.transform:

import scipy.signal

#added last row to sample for avoid error
df = pd.DataFrame( {
   'A': [1,1,1,1,2,2,2,3,3,3],
   'B': [5,4,8,6,5,6,6,7,7,5]})
#print (df)

df['C'] = df.groupby('A')['B'].transform(lambda x: scipy.signal.savgol_filter(x, 3, 1))    
print (df)
   A  B         C
0  1  5  4.166667
1  1  4  5.666667
2  1  8  6.000000
3  1  6  7.000000
4  2  5  5.166667
5  2  6  5.666667
6  2  6  6.166667
7  3  7  7.333333
8  3  7  6.333333
9  3  5  5.333333

Thanks that was very helpful – Mdégé Jul 05 '18 at 13:18 — Mdégé, Jul 05 '18 at 13:18

score 1 · Answer 2 · answered Jul 05 '18 at 13:08

1

Instead of

df.loc[df.A==i]['C']

Use

df.loc[df.A==i, 'C']

By df.loc[df.A==i]['C'] you are actually changing a copy of the df, and not the intended original one

answered Jul 05 '18 at 13:08

rafaelc

57,686
15
58
82

Pandas apply function to subest of a column in order to create a new column

2 Answers2