0

I have a pandas dataframe filled with accounting records that I would like to groupby -> records, but I can't quite figure out which methods to use.

Here is the original dataframe:

person_id|statement_date|expense_type|amount
--------------------------------------------
1        |1/1           |coffee      |5.00
1        |1/1           |bagel       |3.00
1        |1/1           |ticket      |8.00
1        |2/1           |coffee      |5.00
1        |2/1           |donut       |4.00
1        |2/1           |ticket      |8.00
2        |1/1           |coffee      |5.00
2        |1/1           |bagel       |3.00
2        |1/1           |ticket      |8.00
2        |2/1           |coffee      |5.00
2        |2/1           |donut       |4.00
2        |2/1           |ticket      |8.00

Instead of each row containing a single expense line item, I'd like to position expenses as columns, and each record being a person_id + statment_date, like so:

person_id|statement_date|coffee|bagel|ticket|donut
--------------------------------------------------
1        |1/1           |5.00  |3.00 |8.00  |na
1        |2/1           |5.00  |na   |8.00  |3.00
2        |1/1           |5.00  |3.00 |8.00  |na
2        |2/1           |5.00  |na   |8.00  |3.00

The following groupby gets close:

df.groupby(['person_id', 'statement_date','expense_type']).sum()

But I would like the last item, 'expense_type' to become cols.

I'm sure there is some sort of apply that I could use, but I just wasn't able to figure this one out.

Thank you!

ZAR
  • 2,550
  • 4
  • 36
  • 66
  • 1
    What you are looking for is a pivot table. Learn more here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html - I will try to provide a solution in the mean time. – Daniel Feb 07 '20 at 17:48
  • 1
    `df.pivot_table(index=['person_id','statement_date'],columns='expense_type',values='amount')` ? – anky Feb 07 '20 at 17:49
  • Thank you both! I believe you are right, I will try that! – ZAR Feb 07 '20 at 17:50
  • Converting to pivot table, and then using `.to_records()` and back into dataframe is exactly what I needed. Thank you both! – ZAR Feb 07 '20 at 17:55

0 Answers0