1

This dataframe is given:

data = {'dates': ['01.05.2020', '01.05.2020', '01.05.2020', '02.05.2020', '02.05.2020', '03.05.2020', '03.05.2020'], 
        'amount' : [500,1000, 2000,3000,1000,4000, 2300], 'state' : ['ABC', 'ERT', 'ABC', 'XYZ', 'ERT' , 'ZEQ', 'ERT']}

enter image description here

In the column state we have four differerent values (ABC, ERT, XYZ & ZEQ). Now my goal is to add new columns for the amount per state in percentage for each day. Hence, we need to group by dates and then calculate the daily amount in percentage for each state.

So e.g. the first line of the new data frame should look like: enter image description here

How can I do this in the easiest way?

Minfetli
  • 303
  • 3
  • 12

1 Answers1

1

Use DataFrame.pivot_table for pivoting with sum and then divide by summed values per rows:

df = pd.DataFrame(data)

df1 = df.pivot_table(index='dates', 
                    columns='state', 
                    values='amount', 
                    fill_value=0, 
                    aggfunc='sum')
df1 = df1.div(df1.sum(axis=1), axis=0).round(2)
print (df1)
state        ABC   ERT   XYZ   ZEQ
dates                             
01.05.2020  0.71  0.29  0.00  0.00
02.05.2020  0.00  0.25  0.75  0.00
03.05.2020  0.00  0.37  0.00  0.63
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • One question regarding the function "div". I understand that with sum(axis=1) we add up the values in each column (in the first line 0.71 +0.29 +0+0). However, what is the second attribute (axis=0) for? – Minfetli Aug 24 '20 at 15:10
  • @Minfetli You can check [this](https://stackoverflow.com/q/53217607/2901002) – jezrael Aug 24 '20 at 16:56