2

Consider the Following DataFrame

candy = pd.DataFrame({'Name':['Bob','Bob','Bob','Annie','Annie','Annie','Daniel','Daniel','Daniel'], 'Candy': ['Chocolate', 'Chocolate', 'Lollies','Chocolate', 'Chocolate', 'Lollies','Chocolate', 'Chocolate', 'Lollies'], 'Value':[15,15,10,25,30,12,40,40,16]})

After reading the following post, I am aware that apply works on the whole Dataframe and transform works on a series.

Apply vs transform on a group object

So if I want to append the total $ spend on candy per person, I can simply use the following.

candy['Total Spend'] = candy.groupby(['Name'])['Value'].transform(sum)

But if I need to append the total $ chocolate spend per person - it feels like I have no choice but to create a separate dataframe and then merging it back by using the apply function since transform only works on a series.

chocolate = candy.groupby(['Name']).apply(lambda x: x[x['Candy'] == 'Chocolate']['Value'].sum()).reset_index(name = 'Total_Chocolate_Spend')
candy = pd.merge(candy, chocolate, how = 'left',left_on=['Name'], right_on=['Name'])

While I don't mind writing the above code to solve this problem. Is it possible to 'transform' the applied results back to the dataframe without having to create a separate dataframe and merging it?

What is actually happening when the transform function is used? Is a separate series being stored in memory and then merged back by the indexes similar to what I have done in the apply then merged method?

Rigel
  • 47
  • 3

2 Answers2

2

There are other methods. For example:

Create a temp column with just the chocolate value using df.where:

candy["choc_val"] = candy.Value.where(candy.Candy =="Chocolate", other=0)
candy["Total_Chocolate_Spend"] = candy.groupby("Name").choc_val.transform(sum)
candy = candy.drop(columns="choc_val")

output:

     Name      Candy  Value  Total Spend  Total_Chocolate_Spend
0     Bob  Chocolate     15           40                     30
1     Bob  Chocolate     15           40                     30
2     Bob    Lollies     10           40                     30
3   Annie  Chocolate     25           67                     55
4   Annie  Chocolate     30           67                     55
5   Annie    Lollies     12           67                     55
6  Daniel  Chocolate     40           96                     80
7  Daniel  Chocolate     40           96                     80
8  Daniel    Lollies     16           96                     80

I don't know if this is more performant or easier to read.

anon01
  • 10,618
  • 8
  • 35
  • 58
  • 1
    Thanks for your answer, though I have accepted another answer by piterbarg as it preserves the original apply function. – Rigel Jan 14 '21 at 06:10
1

I do not have much to add to the excellent reference you provided on apply vs. transform, but you can do what you want without creating a separate dataframe, for example you can do

candy.groupby(['Name']).apply(lambda x: x.assign(Total_Chocolate_Spend = x[x['Candy'] == 'Chocolate']['Value'].sum()))

this uses assign for each group in groupby to populate Total_Chocolate_Spend with the number you want

piterbarg
  • 8,089
  • 2
  • 6
  • 22