Pandas Transforming the Applied Results back to the original dataframe

Question

Consider the Following DataFrame

candy = pd.DataFrame({'Name':['Bob','Bob','Bob','Annie','Annie','Annie','Daniel','Daniel','Daniel'], 'Candy': ['Chocolate', 'Chocolate', 'Lollies','Chocolate', 'Chocolate', 'Lollies','Chocolate', 'Chocolate', 'Lollies'], 'Value':[15,15,10,25,30,12,40,40,16]})

After reading the following post, I am aware that apply works on the whole Dataframe and transform works on a series.

Apply vs transform on a group object

So if I want to append the total $ spend on candy per person, I can simply use the following.

candy['Total Spend'] = candy.groupby(['Name'])['Value'].transform(sum)

But if I need to append the total $ chocolate spend per person - it feels like I have no choice but to create a separate dataframe and then merging it back by using the apply function since transform only works on a series.

chocolate = candy.groupby(['Name']).apply(lambda x: x[x['Candy'] == 'Chocolate']['Value'].sum()).reset_index(name = 'Total_Chocolate_Spend')
candy = pd.merge(candy, chocolate, how = 'left',left_on=['Name'], right_on=['Name'])

While I don't mind writing the above code to solve this problem. Is it possible to 'transform' the applied results back to the dataframe without having to create a separate dataframe and merging it?

What is actually happening when the transform function is used? Is a separate series being stored in memory and then merged back by the indexes similar to what I have done in the apply then merged method?

score 2 · Answer 1 · answered Jan 14 '21 at 05:13

There are other methods. For example:

Create a temp column with just the chocolate value using df.where:

candy["choc_val"] = candy.Value.where(candy.Candy =="Chocolate", other=0)
candy["Total_Chocolate_Spend"] = candy.groupby("Name").choc_val.transform(sum)
candy = candy.drop(columns="choc_val")

output:

     Name      Candy  Value  Total Spend  Total_Chocolate_Spend
0     Bob  Chocolate     15           40                     30
1     Bob  Chocolate     15           40                     30
2     Bob    Lollies     10           40                     30
3   Annie  Chocolate     25           67                     55
4   Annie  Chocolate     30           67                     55
5   Annie    Lollies     12           67                     55
6  Daniel  Chocolate     40           96                     80
7  Daniel  Chocolate     40           96                     80
8  Daniel    Lollies     16           96                     80

I don't know if this is more performant or easier to read.

Thanks for your answer, though I have accepted another answer by piterbarg as it preserves the original apply function. — Rigel, Jan 14 '21 at 06:10

piterbarg · Accepted Answer · 2021-01-14T08:23:01.513

1

I do not have much to add to the excellent reference you provided on apply vs. transform, but you can do what you want without creating a separate dataframe, for example you can do

candy.groupby(['Name']).apply(lambda x: x.assign(Total_Chocolate_Spend = x[x['Candy'] == 'Chocolate']['Value'].sum()))

this uses assign for each group in groupby to populate Total_Chocolate_Spend with the number you want

edited Jan 14 '21 at 08:23

answered Jan 14 '21 at 05:36

piterbarg

8,089
2
6
22

That is what I am looking for! Thanks! – Rigel Jan 14 '21 at 06:09

Pandas Transforming the Applied Results back to the original dataframe

2 Answers2