How to merge multiple row into one row based on two columns

Question

I have a data frame that has 10 columns and 1000 rows. I am trying to merge two specific rows into one single row based on date and ID The data frame is like this

Id Name Product Date revenue

1 John  P1      2022-02-01 50
1 John  P2     2022-02-01 10
2 Joe   P1     2022-01-01 60
2 Joe   P2     2022-02-01 10
2 Joe   P3     2022-03-01 20


What I want the output as 

Id Name Product Date revenue
1 John  P1,P2     2022-02-01 60
2 Joe   P1,P2,P3  2022-03-01 90

I tried `groupby`, `apply` and `ffill()`. What I am struggling with is to have products as P1,P2,P3 in one single row by their latest date. Any suggestions?

Please double check that you really intentionally undid the apparently helpful formatting improvements by another user. — Yunnosch, Apr 29 '22 at 20:54

score 0 · Answer 1 · answered Apr 29 '22 at 20:52

0

You can use multiple aggregations in a pandas groupby

df.groupby(['Id','Name']).agg({'Product':list,'Date':max,'revenue':sum})
        Product Date    revenue
Id  Name            
1   John    [P1, P2]    2022-01-01  60
2   Joe [P1, P2, P3]    2022-03-01  90

answered Apr 29 '22 at 20:52

G. Anderson

5,815
2
14
21

I tried multiple aggregations and my output only shows one product as `1 John P2 2022-02-01 10 ` – Sonal Apr 29 '22 at 21:13

score 0 · Answer 2 · answered Apr 29 '22 at 20:54

Try aggregating each column with the desired function:

#convert to datetime if needed
df["Date"] = pd.to_datetime(df["Date"])

output = df.groupby(["ID","Name"],as_index=False).agg({"Product": ",".join, 
                                                       "Date":"max",
                                                       "Revenue":"sum"})

>>> output

   ID  Name   Product       Date  Revenue
0   1  John     P1,P2 2022-02-01       60
1   2   Joe  P1,P2,P3 2022-03-01       90

How to merge multiple row into one row based on two columns

2 Answers2