In Pandas how can I use transform and use information from other columns?

Question

I want to use R-style mutate function, where I can use information from other columns. For example: I want to create a new column whose values are a result of first grouping the variables, and then interpolating one column vs. another column in the same data frame. The new column gets the same value for each group.

I tried to use apply with broadcast, however, it only results in NaN values.

import pandas as pd
import numpy as np

d = {'Gain': [20, 20,19,18,17,21,21,20,19,18],
     'Power':[30,31,32,33,34,33,34,35,36,37],
     'GRP':  ['A','A','A','A','A','B','B','B','B','B'],
     }
df = pd.DataFrame(data=d)

# Subtract the value of Gain from the maximum value: THIS STEP WORKS
df['dGain']=df.groupby(['GRP'])['Gain'].transform(lambda x: max(x) - x)

# DOES NOT WORK!!!
df['Pcomp']=df.groupby(['GRP']).transform(lambda x: 
np.interp(3,x.dGain,x.Power)) 

# DOES NOT WORK
df['Pcomp']=df.groupby(['GRP']).apply(lambda x: np.interp(3,x.dGain,x.Power))

I expected:

  Gain  Power GRP  Pcomp  dGain
0    20     30   A     33      0
1    20     31   A     33      0
2    19     32   A     33      1
3    18     33   A     33      2
4    17     34   A     33      3
5    21     33   B     36      0
6    21     34   B     36      0
7    20     35   B     36      1
8    19     36   B     36      2
9    18     37   B     36      3

It's not clear to me how you calculate `Pcomp`? Is it just the Power with index 3 within the group? — rpanai, Jun 24 '19 at 19:42
Thanks for your response rpanai. Sorry did not make that clear. The interpolation is done using value of dGain=3. So it basically interpolation of Power vs. dGain for dGain=3. In R this is so easy with mutate. — Amit, Jun 24 '19 at 19:44
If I take the first group `df1 = df[df["GRP"]=="A"]` then `np.interp(3, df1.dGain, df1.Power)` is equal to `34` not `33` — rpanai, Jun 24 '19 at 19:48

score 2 · Accepted Answer · answered Jun 24 '19 at 19:46

2

We can say, transform almost equal to mutate in R dplyr , however, they still have slightly different , under the groupby object ,transform can pass one , mutate can do multiple , More info

A quick fix

df['Pcomp']=df.groupby('GRP').apply(lambda x: np.interp(3,x['dGain'],x['Power'])).reindex(df.GRP).values
df
Out[828]: 
   Gain  Power GRP  dGain  Pcomp
0    20     30   A      0   34.0
1    20     31   A      0   34.0
2    19     32   A      1   34.0
3    18     33   A      2   34.0
4    17     34   A      3   34.0
5    21     33   B      0   37.0
6    21     34   B      0   37.0
7    20     35   B      1   37.0
8    19     36   B      2   37.0
9    18     37   B      3   37.0

answered Jun 24 '19 at 19:46

BENY

317,841
20
164
234

1

Hey @WeNYoBen that really worked!! Thanks so much. But how did it work? What does reindex really do in this case? How did you get away with apply returning another sized array, and without having to resize it to fit the parent data frame? – Amit Jun 24 '19 at 19:55
@Amit apply will agg the function for each groupkey , which is output single value for each unique key , then we just need expend them back – BENY Jun 24 '19 at 20:02
Got it - Thanks for the response. – Amit Jun 24 '19 at 20:03
It should be great to see if is possible to use `transform` only. – rpanai Jun 24 '19 at 20:19
In R that is possible with a single mutate command. I am Sorry but Python simply is not straightforward to use in this aspect. Imagine, I had multiple groups, how would you reindex multiple groups! – Amit Jun 24 '19 at 20:36
1

`df.groupby(['col1','col2']).apply(lambda x: np.interp(3,x['dGain'],x['Power'])).reindex(pd.MultiIndex.from_frame(df[['col1','col2']])).values` @Amit – BENY Jun 24 '19 at 20:42
Man @WeNYoBen you have answers for everything! That is great! Thanks again. – Amit Jun 24 '19 at 20:55

In Pandas how can I use transform and use information from other columns?

1 Answers1