I have a periodic table that includes premium in different categories over a year for different companies. The dataframe looks like the below:
Company | Type | Month | Year | Ferdi Grup | Premium | |
---|---|---|---|---|---|---|
1 | Allianz | Birikimli Hayat | 1 | 2022 | Ferdi | 325 |
2 | Allianz | Birikimli Hayat | 2 | 2022 | Ferdi | 476 |
3 | Axa | Birikimli Hayat | 3 | 2022 | Ferdi | 687 |
I want to get a table where I can see the premium cumulated over 'Company' and 'Year'. For each month I want to see premium cumulated from the beginning of the year.
This is the regular sum operation which works well in this case.
data.pivot_table(
columns = 'Company',
index = 'Month',
values = 'Premium',
aggfunc= np.sum
)
However when I change to np.cumsum
the result is a series. I want a cumulated pivot table for each year, adding each month's value to the previous ones. How can I do that?
Expected output:
Company | Month | Year | Premium | |
---|---|---|---|---|
1 | Allianz | 1 | 2022 | 325 |
2 | Allianz | 2 | 2022 | 801 |
3 | Axa | 3 | 2022 | 687 |
So, this is the original data I am working with:
{'Company': {0: 'AgeSA',
1: 'Türkiye',
2: 'Türkiye',
3: 'AgeSA',
4: 'AgeSA',
5: 'Türkiye',
6: 'AgeSA',
7: 'Türkiye',
8: 'Türkiye',
9: 'AgeSA',
10: 'Türkiye',
11: 'Türkiye',
12: 'AgeSA',
13: 'Türkiye',
14: 'Türkiye',
15: 'AgeSA',
16: 'AgeSA',
17: 'Türkiye',
18: 'AgeSA',
19: 'Türkiye',
20: 'Türkiye',
21: 'AgeSA',
22: 'Türkiye',
23: 'Türkiye'},
'Type': {0: 'Birikimli Hayat',
1: 'Birikimli Hayat',
2: 'Sadece Yaşam Teminatlı',
3: 'Karma Sigorta',
4: 'Yıllık Vefat',
5: 'Yıllık Vefat',
6: 'Uzun Süreli Vefat',
7: 'Uzun Süreli Vefat',
8: 'Birikimli Hayat',
9: 'Yıllık Vefat',
10: 'Yıllık Vefat',
11: 'Uzun Süreli Vefat',
12: 'Birikimli Hayat',
13: 'Birikimli Hayat',
14: 'Sadece Yaşam Teminatlı',
15: 'Karma Sigorta',
16: 'Yıllık Vefat',
17: 'Yıllık Vefat',
18: 'Uzun Süreli Vefat',
19: 'Uzun Süreli Vefat',
20: 'Birikimli Hayat',
21: 'Yıllık Vefat',
22: 'Yıllık Vefat',
23: 'Uzun Süreli Vefat'},
'Month': {0: 1,
1: 1,
2: 1,
3: 1,
4: 1,
5: 1,
6: 1,
7: 1,
8: 1,
9: 1,
10: 1,
11: 1,
12: 2,
13: 2,
14: 2,
15: 2,
16: 2,
17: 2,
18: 2,
19: 2,
20: 2,
21: 2,
22: 2,
23: 2},
'Year': {0: 2022,
1: 2022,
2: 2022,
3: 2022,
4: 2022,
5: 2022,
6: 2022,
7: 2022,
8: 2022,
9: 2022,
10: 2022,
11: 2022,
12: 2022,
13: 2022,
14: 2022,
15: 2022,
16: 2022,
17: 2022,
18: 2022,
19: 2022,
20: 2022,
21: 2022,
22: 2022,
23: 2022},
'Ferdi Grup': {0: 'Ferdi',
1: 'Ferdi',
2: 'Ferdi',
3: 'Ferdi',
4: 'Ferdi',
5: 'Ferdi',
6: 'Ferdi',
7: 'Ferdi',
8: 'Grup',
9: 'Grup',
10: 'Grup',
11: 'Grup',
12: 'Ferdi',
13: 'Ferdi',
14: 'Ferdi',
15: 'Ferdi',
16: 'Ferdi',
17: 'Ferdi',
18: 'Ferdi',
19: 'Ferdi',
20: 'Grup',
21: 'Grup',
22: 'Grup',
23: 'Grup'},
'Premium': {0: 936622.43,
1: 14655.67,
2: 8496.0,
3: 124768619.29,
4: 6651019.24,
5: 11055383.530005993,
6: 54273212.457471885,
7: 22163192.66,
8: 81000.95,
9: 9338009.52,
10: 251790130.54997802,
11: 140949274.79999998,
12: 910808.77,
13: 8754.71,
14: 7128.0,
15: 129753498.31,
16: 8015974.454128993,
17: 16776490.000003006,
18: 67607915.34000003,
19: 24683694.700000003,
20: 60887.56,
21: 1497105.2458709963,
22: 195019190.297756,
23: 167424048.43},
'cumsum': {0: 936622.43,
1: 14655.67,
2: 23151.67,
3: 125705241.72000001,
4: 132356260.96000001,
5: 11078535.200005993,
6: 186629473.4174719,
7: 33241727.860005993,
8: 33322728.810005993,
9: 195967482.9374719,
10: 285112859.35998404,
11: 426062134.159984,
12: 196878291.7074719,
13: 426070888.869984,
14: 426078016.869984,
15: 326631790.0174719,
16: 334647764.4716009,
17: 442854506.869987,
18: 402255679.8116009,
19: 467538201.569987,
20: 467599089.129987,
21: 403752785.05747193,
22: 662618279.427743,
23: 830042327.857743}}
This is the result of a regular sum pivot:
AgeSA | Türkiye | |
---|---|---|
1 | 195967482.9374719 | 426062134.159984 |
2 | 207785302.12000003 | 403980193.69775903 |
When I use the suggested code as below:
df_2 = data.copy()
df_2['cumsum'] = df_2.groupby(['Company', 'Year'])[['Premium']].cumsum()
df_2.sort_values(['Company', 'Year', 'cumsum']).reset_index(drop = True)
Each line gets a cumsum value from the above lines it seems:
For me to be able to get the table I need, I need to get max in each group again in a pivot_table:
df_2.pivot_table(
index = ['Year', 'Month'],
values = ['Premium', 'cumsum'],
columns = 'Company',
aggfunc = {'Premium': 'sum', 'cumsum': 'max'}
)
which finally gets me to this result:
Is it that difficult to get the cumsum table in pandas or am I just doing it the hard way?