I have a dataframe similar to this one:
data = {
"index": [0, 1, 2, 3, 4, 5, 6, 7, 8],
"batchId": [100, 100, 100, 100, 100, 200, 200, 200, 200],
"amount": [50, 100, 150, 200, 250, 300, 350, 400, 450],
"date": [
"2020-11-24 15:00:00",
"2020-11-24 15:00:00",
"2020-11-25 15:00:00",
"2020-11-25 15:00:00",
"2020-11-25 15:00:00",
"2020-11-25 15:00:00",
"2020-11-25 15:00:00",
"2020-11-25 15:00:00",
"2020-11-26 15:00:00",
],
}
data_frame = pd.DataFrame(data)
index batchId amount date
0 0 100 50 2020-11-24 15:00:00
1 1 100 100 2020-11-24 15:00:00
2 2 100 150 2020-11-25 15:00:00
3 3 100 200 2020-11-25 15:00:00
4 4 100 250 2020-11-25 15:00:00
5 5 200 300 2020-11-25 15:00:00
6 6 200 350 2020-11-25 15:00:00
7 7 200 400 2020-11-25 15:00:00
8 8 200 450 2020-11-26 15:00:00
I'm trying to group by the batchId
column, then sum by the amount
column and also add the latest date of the summed batch.
So far I have only managed to sum up the amounts based on the batchId
:
batch_amounts = (
data_frame["amount"].groupby(data_frame["batchId"]).sum().reset_index()
)
Which gives:
batchId amount
0 100 750
1 200 1500
How can I also add the date column that will show the latest date of each batch?
batchId amount date
0 100 750 2020-11-25 15:00:00
1 200 1500 2020-11-26 15:00:00