1

I have a data frame that has 10 columns and 1000 rows. I am trying to merge two specific rows into one single row based on date and ID The data frame is like this

Id Name Product Date revenue

1 John  P1      2022-02-01 50
1 John  P2     2022-02-01 10
2 Joe   P1     2022-01-01 60
2 Joe   P2     2022-02-01 10
2 Joe   P3     2022-03-01 20


What I want the output as 

Id Name Product Date revenue
1 John  P1,P2     2022-02-01 60
2 Joe   P1,P2,P3  2022-03-01 90

I tried `groupby`, `apply` and `ffill()`. What I am struggling with is to have products as P1,P2,P3 in one single row by their latest date. Any suggestions?
Sonal
  • 11
  • 3
  • Please double check that you really intentionally undid the apparently helpful formatting improvements by another user. – Yunnosch Apr 29 '22 at 20:54

2 Answers2

0

You can use multiple aggregations in a pandas groupby

df.groupby(['Id','Name']).agg({'Product':list,'Date':max,'revenue':sum})
        Product Date    revenue
Id  Name            
1   John    [P1, P2]    2022-01-01  60
2   Joe [P1, P2, P3]    2022-03-01  90
G. Anderson
  • 5,815
  • 2
  • 14
  • 21
  • I tried multiple aggregations and my output only shows one product as `1 John P2 2022-02-01 10 ` – Sonal Apr 29 '22 at 21:13
0

Try aggregating each column with the desired function:

#convert to datetime if needed
df["Date"] = pd.to_datetime(df["Date"])

output = df.groupby(["ID","Name"],as_index=False).agg({"Product": ",".join, 
                                                       "Date":"max",
                                                       "Revenue":"sum"})

>>> output

   ID  Name   Product       Date  Revenue
0   1  John     P1,P2 2022-02-01       60
1   2   Joe  P1,P2,P3 2022-03-01       90
not_speshal
  • 22,093
  • 2
  • 15
  • 30