1

df

Hello Community,

in my dataframe i have three columns as follow:

restaurant_id, annee, moyenne_etoile_annee.

for each restaurant_id i would like to substract the "moyenne_etoiles_annee' of the last year minus the one from the first year.

By example for the first restaurant_id df[diff]=moyenne_etoiles_annee(2017)-moyenne_etoiles_annee(2015) 2-2.66=-0.66

I tried to .agg and select first and last value but wasn't able to execute properly.

Sample:

df = pd.DataFrame({"restaurant_id": [1,1,2,2],
                   "annee": [2015,2016,2019,2022],
                   "moyenne_etoile_annee": [1.2,1.4,1.3,1.3]}) 
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
Aris
  • 17
  • 3
  • Please post the data as code (text) so it's easy for us to copy to test a possible solution. Good job though, on giving us input and expected output, and explain what you tried. – Allan Wind Feb 25 '21 at 05:23
  • i just created some sample data replicating df = pd.DataFrame({"A": [1,1,2,2], "B": [2015,2016,2019,2022],"C": [1.2,1.4,1.3,1.3]}) print (df) – Aris Feb 25 '21 at 05:29
  • @Aris - Super, added to quetion. – jezrael Feb 25 '21 at 05:31

1 Answers1

1

Solution working well if unique years per restaurant_id like in question data.

First use DataFrame.sort_values per both columns (if necessary) for first and last annee per restaurant_id, then subtract first and last value per restaurant_id in GroupBy.transform:

df1 = df.sort_values(['restaurant_id','annee'])
g = df1.groupby('restaurant_id')['moyenne_etoile_annee']
df1['diff'] = g.transform('last').sub(g.transform('first'))

Alternative with lambda function:

df1 = df.sort_values(['restaurant_id','annee'])
g = df1.groupby('restaurant_id')['moyenne_etoile_annee']
df1['diff'] = g.transform(lambda x: x.iat[-1] - x.iat[0])
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • if the data frame has duplicate rows for the same year then we won't get proper results – Jethen Feb 25 '21 at 05:34
  • @Jethen - hmmm, your answer has same problem. – jezrael Feb 25 '21 at 05:36
  • 1
    i just tested the first solution and it worked. Perfect thanks – Aris Feb 25 '21 at 05:39
  • @Jethen - So if there are duplicated year per `restaurant_id` there is same problem - not seelcted first and last value. If you think my solution not working per `restaurant_id` then you are wrong, because using `groupby` – jezrael Feb 25 '21 at 05:49