I have a spending dataset with 3 columns, date, id, and amount. The date column only changes by month. Example:
import pandas as pd
cols = ['date','id','amount']
data = [['01-01-2019',1,320],
['01-01-2019',2,1000],
['01-01-2019',3,0],
['01-02-2019',1,0],
['01-02-2019',2,500],
['01-02-2019',3,123],
['01-03-2019',1,657],
['01-03-2019',2,300],
['01-03-2019',3,5],
['01-04-2019',1,3],
['01-04-2019',2,800],
['01-04-2019',3,42],
['01-05-2019',1,0],
['01-05-2019',2,656],
['01-05-2019',3,6]
]
df = pd.DataFrame(data)
df.columns = cols
Looks like:
df.head()
date id amount
0 01-01-2019 1 320
1 01-01-2019 2 1000
2 01-01-2019 3 0
3 01-02-2019 1 0
4 01-02-2019 2 500
I want to transpose this dataset so that each row is for one id, and the columns are the amounts spent per month. There can be a large number of months but its guaranteed that each month will have all users in them. Example:
Ideal outcome
id month 1 month 2 month 3 month 4 month 5
0 1 320 0 657 3 0
1 2 1000 500 300 800 656
2 3 0 123 5 42 6
I have a rather inefficient way to do this now, and am sure theres a more panda-esque way to do this. probably using groupby or something but I cant figure out how. Would anyone have any idea?
Current method
ids = df['id'].unique()
for uniq_id in ids:
sub_amt = df[df['id'] == uniq_id]['amount'].to_list()
temp = [uniq_id] + sub_amt
master.append(temp)
transposed_df = pd.DataFrame(master)
transposed_df.columns= ['id'] + ['month {}'.format(x) for x in range(1,len(sub_amt)+1)]
Thank you!
-- EDIT --
I know the question got marked as duplicate, but to save you a click and trouble of fitting those examples to this specific question, I solved this question by using df.pivot(index='id', columns='date',values ='amount')