1

I have a spending dataset with 3 columns, date, id, and amount. The date column only changes by month. Example:

import pandas as pd
cols = ['date','id','amount']
data = [['01-01-2019',1,320],
        ['01-01-2019',2,1000],
        ['01-01-2019',3,0],
        ['01-02-2019',1,0],
        ['01-02-2019',2,500],
        ['01-02-2019',3,123],
        ['01-03-2019',1,657],
        ['01-03-2019',2,300],
        ['01-03-2019',3,5],
        ['01-04-2019',1,3],
        ['01-04-2019',2,800],
        ['01-04-2019',3,42],
        ['01-05-2019',1,0],
        ['01-05-2019',2,656],
        ['01-05-2019',3,6]
        ]
df = pd.DataFrame(data)
df.columns = cols

Looks like:

df.head()
         date  id  amount
0  01-01-2019   1     320
1  01-01-2019   2    1000
2  01-01-2019   3       0
3  01-02-2019   1       0
4  01-02-2019   2     500

I want to transpose this dataset so that each row is for one id, and the columns are the amounts spent per month. There can be a large number of months but its guaranteed that each month will have all users in them. Example:

Ideal outcome

   id  month 1  month 2  month 3  month 4  month 5
0   1      320        0      657        3        0
1   2     1000      500      300      800      656
2   3        0      123        5       42        6

I have a rather inefficient way to do this now, and am sure theres a more panda-esque way to do this. probably using groupby or something but I cant figure out how. Would anyone have any idea?

Current method

ids = df['id'].unique()
for uniq_id in ids:
    sub_amt = df[df['id'] == uniq_id]['amount'].to_list()
    temp = [uniq_id] + sub_amt
    master.append(temp)

transposed_df = pd.DataFrame(master)
transposed_df.columns= ['id'] + ['month {}'.format(x) for x in range(1,len(sub_amt)+1)]

Thank you!

-- EDIT --

I know the question got marked as duplicate, but to save you a click and trouble of fitting those examples to this specific question, I solved this question by using df.pivot(index='id', columns='date',values ='amount')

Wboy
  • 2,452
  • 2
  • 24
  • 45
  • Maybe with pivot? – Celius Stingher Nov 28 '19 at 15:22
  • How though? I still cant figure it out after reading the post – Wboy Nov 28 '19 at 15:35
  • Like this! df['date'] = pd.to_datetime(df['date'],format='%d-%m-%Y') df['month'] = df['date'].dt.month df_f = pd.pivot_table(df,values='amount',index='id',columns='month',aggfunc=np.sum) If you open a new question, let me know and I'll paste it as an answer with the expected output. – Celius Stingher Nov 28 '19 at 15:42
  • 1
    Thanks for that! i just figured it out: `df.pivot(index='id', columns='date',values ='amount')` but yours works too! – Wboy Nov 28 '19 at 15:45
  • Don't forget the agg function! I find pivot_table a bit more flexible, but of course your function also gets the job done :) – Celius Stingher Nov 28 '19 at 15:45
  • pd.pivot(df.groupby([df["date"].str.extract(r"-(\d{2})-")[0],"id"]).sum().reset_index(), index="id", columns=0, values="amount") – Fourier Nov 28 '19 at 15:47
  • Wait why would I need the aggregate function? I get the same results @IvanLibedinsky – Wboy Nov 28 '19 at 16:02
  • Perhaps I mistankly supposed it is a requiered arguments, seems not to be the case for `pivot`. Also, the default operation for `pivot` seems to be sum. – Celius Stingher Nov 28 '19 at 16:57

0 Answers0