0

[I have not found exact answer on that question in those "duplicates", that jezrael suggests.]

I have quite complicated dataframe. I need to group it by 3 columns and then take the most recent (by columns 'date') value in EACH group.

place  customer type  date        sales
LA        m      a   2018-12-1      1
LA        m      b   2018-12-1      4
LA        m      a   2018-12-4      2
LA        m      b   2018-12-4      5
LA        m      a   2018-12-3      3
LA        m      b   2018-12-3      1
LA        m      a   2018-12-7      4
LA        m      b   2018-12-7      10
LA        f      a   2018-12-1      1
LA        f      b   2018-12-1      1
LA        f      a   2018-12-4      15
LA        f      b   2018-12-4      12
LA        f      a   2018-12-3      31
LA        f      b   2018-12-3      11
LA        f      a   2018-12-7      14
LA        f      b   2018-12-7      40
NY        m      a   2018-12-1      3
NY        m      b   2018-12-1      4
NY        m      a   2018-12-4      1
NY        m      b   2018-12-4      13
NY        m      a   2018-12-3      14
NY        m      b   2018-12-3      8
NY        m      a   2018-12-7      11
NY        m      b   2018-12-7      10

I need to group df by place, customer and type, and then in each subgroup select most recent date and corresponding sales:

place  customer type  date        sales
LA        m      a   2018-12-7      4
                 b   2018-12-7      10
          f      a   2018-12-7      40
                 b   2018-12-7      14
NY        m      a   2018-12-7      11
                 b   2018-12-7      10

I tried

gr = df.groupby([df['place'], df['customer'], df['type']]).last()

but this gives the last row, which generally speaking is not always the most recent date (dates can be shuffled)

DDR
  • 459
  • 5
  • 15
  • @jerzrael, thanks for providing link for this solution, I could come up with `df.groupby(['place'],sort=False)['sales'].max()` could you please do add how to get OP's like output, I apologies if I missed something in your link. Since didn't find it so thought to check with you once here. – RavinderSingh13 Dec 10 '18 at 09:09
  • I need max('date'), not max('sales'). – DDR Dec 10 '18 at 09:18
  • I came up with following solution: idx = df.groupby([df['place'], df['customer'], df['type']])['date'].idxmax() df_new = df.loc[idx]. Then one can groupby df_new, if needed. – DDR Dec 10 '18 at 09:20

0 Answers0