I have this dataframe:
dff = pd.DataFrame({"country":["cn","cn","cn", "cn","bn","bn","sn"], 'date':['2019-6-6','2019-6-5','2019-7-6','2019-7-9','2017-4-3','2017-3-2','2019-2-9'],'Revenue':[100,200,100,90,300,400,500], "class":["A","C", "B","B", "C","A","B"]})
dff
country date Revenue class
0 cn 2019-6-6 100 A
1 cn 2019-6-5 200 C
2 cn 2019-7-6 100 B
3 cn 2019-7-9 90 B
4 bn 2017-4-3 300 C
5 bn 2017-3-2 400 A
6 sn 2019-2-9 500 B
I want to achieve three features:
- The date
2019-6-6
are modified to monthJun-2019
- The revennue are aggregated according to month
- The revenue are classified according to the class
The result table should look like this:
country date A B C
0 cn Jun-2019 100 0 200
1 cn Jul-2019 0 190 0
2 bn Apl-2017 0 0 300
3 bn Mar-2017 400 0 0
4 sn Feb-2019 0 500 0
To achieve feature 1,
dff.date = pd.to_datetime(dff.date )
dff.date = dff.date.apply(lambda x : x.strftime("%b-%Y"))
dff
country date Revenue class
0 cn Jun-2019 100 A
1 cn Jun-2019 200 C
2 cn Jul-2019 100 B
3 cn Jul-2019 90 B
4 bn Apr-2017 300 C
5 bn Mar-2017 400 A
6 sn Feb-2019 500 B
To achieve feature 2,
dff1= dff.groupby(["date"]).sum()
dff1
Revenue
date
Apr-2017 300
Feb-2019 500
Jul-2019 190
Jun-2019 300
Mar-2017 400
To achieve feature 3,
dff.pivot( columns='class')['Revenue']
class A B C
0 100.0 NaN NaN
1 NaN NaN 200.0
2 NaN 100.0 NaN
3 NaN 90.0 NaN
4 NaN NaN 300.0
5 400.0 NaN NaN
6 NaN 500.0 NaN
I feel I am almost there, but i can't figure out how to combine this simultaneously to achieve the final result. Any idea is appreciated!