0

after groupby, i want to agg with lambda with condition, select only 'product1_Gift0' == 1

but doesn't seem able to get the answer

need guidance on the 'margin' calculation, instead of calculate all, calculate only when 'product1_Gift0' equal '1'

data = [['john', 'A01', 0, 0.0],['john', 'A01', 1, 1.0],['john', 'A01', 1, 0.5],['jess', 'B01', 0, 0.0],['jess', 'B01', 0, 0.0],['jess', 'B01', 1, 0.8]]


df2 = pd.DataFrame(data, columns = ['member', 'orderID','product1_Gift0','margin']) 

df3 = df2.groupby('member').agg({
                                 'product1_Gift0': lambda x: sum(x)/len(x),
                                 'margin' : lambda x: sum(x)/len(x),
                              })

actual_result = [['john', 'A01', 0.3333, 0.50],
                 ['john', 'A01', 0.6667, 0.27]]

expected_result = [['john', 'A01', 0.3333, 0.75],
                   ['john', 'A01', 0.6667, 0.80]]

thanks

Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34
Jonathan
  • 115
  • 9

1 Answers1

0

Try this one:

>>> df2.groupby('member').apply(lambda x: pd.Series({"product_Gift0_mean": x.product1_Gift0.mean(), "margin_mean": sum(x.margin * (x.product1_Gift0==1))/(x.product1_Gift0==1).sum()}))
        margin_mean  product_Gift0_mean
member
jess           0.80            0.333333
john           0.75            0.666667


Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34
  • thanks! out of topic, can you point to a resource you use, when you try to grasp the understanding of lambda – Jonathan Sep 04 '19 at 09:41
  • Lambda function is just function to be applied on any input, that you are getting (so lambda function argument is an input). You can try the basics with W3 Schools: https://www.w3schools.com/python/python_lambda.asp, and the documentation of ```apply``` itself: https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.apply.html – Grzegorz Skibinski Sep 04 '19 at 09:44
  • i tried to understand your logic. Your answer was splitting the answer into column of '0', '1'. may i know how to achieved the 'expected result' where the answer is formed through the 'agg' function, one-liner. i could do it in several, joining both df and rename. just want to know how you achieved it in one-liner. thanks – Jonathan Sep 05 '19 at 07:19
  • @Jonathan please find updated. Now I used apply instead of agg, because: https://stackoverflow.com/a/21831599/11610186 . Apply just interacts with the whole data set - and so I can build interdependencies between columns - which is what you wanted (as far as I understood). – Grzegorz Skibinski Sep 05 '19 at 08:14