2

I have two multiindex series

series 1

Company Name     Product     Price        TransactionID
Company A        Apple       10           T0001
Company B        Grapes      20           T0002
                 Orange      30           T0003

series 2

Company Name     Product     Price        TransactionID
Company A        Orange      10           T0004
                 Apple       20           T0005
Company B        Orange      20           T0006

I want to form the series like below

Company Name     Product     Sum_price    TransactionID
Company A        Orange      10           [T0004]
                 Apple       30           [T0001,T0005]
Company B        Orange      50           [T0003,T0006]
                 Grapes      20           [T0002]

From Group dataframe and get sum AND count? shows that I can use .agg to do sum and count at the same time, but how to apply .apply(list) and sum together? Right now, I have applied the code Pandas: adding multiindex Series/Dataframes containing lists to form two series one is sum and another one is the one with transactionID. I think I can concat two series together, but I just want to see whether there is a better way to do it.

Platalea Minor
  • 877
  • 2
  • 9
  • 22

2 Answers2

1

You can first concat both of your dataframes like this:

df3 = pd.concat([df1, df2])

After this, you can use groupby agg to do the sum and also use a lambda function to concatenate the lists

df3 = df3.groupby(['Company', 'Name', 'Product']).agg({'Price':sum, 'TransactionID': lambda x: [i for i in x]})

You will then end up with the desired output like so:

Company Name     Product     Sum_price    TransactionID
Company A        Orange      10           [T0004]
                 Apple       30           [T0001,T0005]
Company B        Orange      50           [T0003,T0006]
                 Grapes      20           [T0002]
Zito Relova
  • 1,041
  • 2
  • 13
  • 32
1

Use concat with aggregate by aggregate with sum and converting to list:

df = (pd.concat([df1, df2])
        .groupby(['Company Name', 'Product'], as_index=False)
        .agg({'Price':'sum', 'TransactionID': lambda x: x.tolist()})
     )
print (df)
  Company Name Product  Price   TransactionID
0    Company A   Apple     30  [T0001, T0005]
1    Company A  Orange     10         [T0004]
2    Company B  Grapes     20         [T0002]
3    Company B  Orange     50  [T0003, T0006]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252