I have a pandas dataframe filled with accounting records that I would like to groupby -> records, but I can't quite figure out which methods to use.
Here is the original dataframe:
person_id|statement_date|expense_type|amount
--------------------------------------------
1 |1/1 |coffee |5.00
1 |1/1 |bagel |3.00
1 |1/1 |ticket |8.00
1 |2/1 |coffee |5.00
1 |2/1 |donut |4.00
1 |2/1 |ticket |8.00
2 |1/1 |coffee |5.00
2 |1/1 |bagel |3.00
2 |1/1 |ticket |8.00
2 |2/1 |coffee |5.00
2 |2/1 |donut |4.00
2 |2/1 |ticket |8.00
Instead of each row containing a single expense line item, I'd like to position expenses as columns, and each record being a person_id
+ statment_date
, like so:
person_id|statement_date|coffee|bagel|ticket|donut
--------------------------------------------------
1 |1/1 |5.00 |3.00 |8.00 |na
1 |2/1 |5.00 |na |8.00 |3.00
2 |1/1 |5.00 |3.00 |8.00 |na
2 |2/1 |5.00 |na |8.00 |3.00
The following groupby gets close:
df.groupby(['person_id', 'statement_date','expense_type']).sum()
But I would like the last item, 'expense_type' to become cols.
I'm sure there is some sort of apply that I could use, but I just wasn't able to figure this one out.
Thank you!