-1

My dataframe is similar to:

   transaction       date  cash
0            1 2020-01-01    72
1            2 2020-01-03   100
2            2 2020-01-05   -75
3            3 2020-01-05    82

I want the output to group by transaction and to sum the cash for each transaction (if there is two amounts) BUT ALSO to return the later date. So for transaction 2 the end result would show transaction, date, cash as: 2,1/5/2020, 25...

Not sure how to make tables to help the visuals in my question yet so sorry, please let me know if there's any questions.

Rodalm
  • 5,169
  • 5
  • 21
Yeti
  • 9
  • 1
  • 1
    Hello, welcome to Stack Overflow. I would suggest you read this and modify your question accordingly: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Steele Farnsworth Oct 31 '21 at 02:09

2 Answers2

1

Use groupby + agg. Check the docs examples.

output = df.groupby('transaction').agg({'date': 'max', 'cash': 'sum'})
Rodalm
  • 5,169
  • 5
  • 21
0

This solution assumes that the date column is encoded as proper datetime instances. If this is not currently the case, try df['date'] = pd.to_datetime(df['date']) before doing the following.

>>> df
   transaction       date  cash
0            1 2020-01-01    72
1            2 2020-01-03   100
2            2 2020-01-05   -75
3            3 2020-01-05    82
>>> transactions = df.groupby('transaction')
>>> pd.concat((transactions['cash'].sum(), transactions['date'].max()), axis=1)
             cash       date
transaction
1              72 2020-01-01
2              25 2020-01-05
3              82 2020-01-05

transactions['date'].max() picks the date furthest into the future of those with the same transaction ID.

Steele Farnsworth
  • 863
  • 1
  • 6
  • 15