0

I have a periodic table that includes premium in different categories over a year for different companies. The dataframe looks like the below:

Company Type Month Year Ferdi Grup Premium
1 Allianz Birikimli Hayat 1 2022 Ferdi 325
2 Allianz Birikimli Hayat 2 2022 Ferdi 476
3 Axa Birikimli Hayat 3 2022 Ferdi 687

I want to get a table where I can see the premium cumulated over 'Company' and 'Year'. For each month I want to see premium cumulated from the beginning of the year.

This is the regular sum operation which works well in this case.

data.pivot_table(
    columns = 'Company', 
    index = 'Month', 
    values = 'Premium', 
    aggfunc= np.sum
)

However when I change to np.cumsum the result is a series. I want a cumulated pivot table for each year, adding each month's value to the previous ones. How can I do that?

Expected output:

Company Month Year Premium
1 Allianz 1 2022 325
2 Allianz 2 2022 801
3 Axa 3 2022 687

So, this is the original data I am working with:

{'Company': {0: 'AgeSA',
  1: 'Türkiye',
  2: 'Türkiye',
  3: 'AgeSA',
  4: 'AgeSA',
  5: 'Türkiye',
  6: 'AgeSA',
  7: 'Türkiye',
  8: 'Türkiye',
  9: 'AgeSA',
  10: 'Türkiye',
  11: 'Türkiye',
  12: 'AgeSA',
  13: 'Türkiye',
  14: 'Türkiye',
  15: 'AgeSA',
  16: 'AgeSA',
  17: 'Türkiye',
  18: 'AgeSA',
  19: 'Türkiye',
  20: 'Türkiye',
  21: 'AgeSA',
  22: 'Türkiye',
  23: 'Türkiye'},
 'Type': {0: 'Birikimli Hayat',
  1: 'Birikimli Hayat',
  2: 'Sadece Yaşam Teminatlı',
  3: 'Karma Sigorta',
  4: 'Yıllık Vefat',
  5: 'Yıllık Vefat',
  6: 'Uzun Süreli Vefat',
  7: 'Uzun Süreli Vefat',
  8: 'Birikimli Hayat',
  9: 'Yıllık Vefat',
  10: 'Yıllık Vefat',
  11: 'Uzun Süreli Vefat',
  12: 'Birikimli Hayat',
  13: 'Birikimli Hayat',
  14: 'Sadece Yaşam Teminatlı',
  15: 'Karma Sigorta',
  16: 'Yıllık Vefat',
  17: 'Yıllık Vefat',
  18: 'Uzun Süreli Vefat',
  19: 'Uzun Süreli Vefat',
  20: 'Birikimli Hayat',
  21: 'Yıllık Vefat',
  22: 'Yıllık Vefat',
  23: 'Uzun Süreli Vefat'},
 'Month': {0: 1,
  1: 1,
  2: 1,
  3: 1,
  4: 1,
  5: 1,
  6: 1,
  7: 1,
  8: 1,
  9: 1,
  10: 1,
  11: 1,
  12: 2,
  13: 2,
  14: 2,
  15: 2,
  16: 2,
  17: 2,
  18: 2,
  19: 2,
  20: 2,
  21: 2,
  22: 2,
  23: 2},
 'Year': {0: 2022,
  1: 2022,
  2: 2022,
  3: 2022,
  4: 2022,
  5: 2022,
  6: 2022,
  7: 2022,
  8: 2022,
  9: 2022,
  10: 2022,
  11: 2022,
  12: 2022,
  13: 2022,
  14: 2022,
  15: 2022,
  16: 2022,
  17: 2022,
  18: 2022,
  19: 2022,
  20: 2022,
  21: 2022,
  22: 2022,
  23: 2022},
 'Ferdi Grup': {0: 'Ferdi',
  1: 'Ferdi',
  2: 'Ferdi',
  3: 'Ferdi',
  4: 'Ferdi',
  5: 'Ferdi',
  6: 'Ferdi',
  7: 'Ferdi',
  8: 'Grup',
  9: 'Grup',
  10: 'Grup',
  11: 'Grup',
  12: 'Ferdi',
  13: 'Ferdi',
  14: 'Ferdi',
  15: 'Ferdi',
  16: 'Ferdi',
  17: 'Ferdi',
  18: 'Ferdi',
  19: 'Ferdi',
  20: 'Grup',
  21: 'Grup',
  22: 'Grup',
  23: 'Grup'},
 'Premium': {0: 936622.43,
  1: 14655.67,
  2: 8496.0,
  3: 124768619.29,
  4: 6651019.24,
  5: 11055383.530005993,
  6: 54273212.457471885,
  7: 22163192.66,
  8: 81000.95,
  9: 9338009.52,
  10: 251790130.54997802,
  11: 140949274.79999998,
  12: 910808.77,
  13: 8754.71,
  14: 7128.0,
  15: 129753498.31,
  16: 8015974.454128993,
  17: 16776490.000003006,
  18: 67607915.34000003,
  19: 24683694.700000003,
  20: 60887.56,
  21: 1497105.2458709963,
  22: 195019190.297756,
  23: 167424048.43},
 'cumsum': {0: 936622.43,
  1: 14655.67,
  2: 23151.67,
  3: 125705241.72000001,
  4: 132356260.96000001,
  5: 11078535.200005993,
  6: 186629473.4174719,
  7: 33241727.860005993,
  8: 33322728.810005993,
  9: 195967482.9374719,
  10: 285112859.35998404,
  11: 426062134.159984,
  12: 196878291.7074719,
  13: 426070888.869984,
  14: 426078016.869984,
  15: 326631790.0174719,
  16: 334647764.4716009,
  17: 442854506.869987,
  18: 402255679.8116009,
  19: 467538201.569987,
  20: 467599089.129987,
  21: 403752785.05747193,
  22: 662618279.427743,
  23: 830042327.857743}}

This is the result of a regular sum pivot:

AgeSA Türkiye
1 195967482.9374719 426062134.159984
2 207785302.12000003 403980193.69775903

When I use the suggested code as below:

df_2 = data.copy()
df_2['cumsum'] = df_2.groupby(['Company', 'Year'])[['Premium']].cumsum()

df_2.sort_values(['Company', 'Year', 'cumsum']).reset_index(drop = True)

Each line gets a cumsum value from the above lines it seems:

Image containing different cumsum values in each line.

For me to be able to get the table I need, I need to get max in each group again in a pivot_table:

df_2.pivot_table(
    index = ['Year', 'Month'], 
    values = ['Premium', 'cumsum'],
    columns = 'Company',
    aggfunc = {'Premium': 'sum', 'cumsum': 'max'}
    )

which finally gets me to this result:

Image showing pivot table results with cumsum

Is it that difficult to get the cumsum table in pandas or am I just doing it the hard way?

2 Answers2

1

Your dataframe is already in the right format, why you want to pivot it again? I think what you are searching for is a pandas.groupby.

df['cumsum_by_group'] = df.groupby(['Company', 'Year'])['Premium'].cumsum()

Output:

   Company             Type  Month  Year Ferdi Grup  Premium  cumsum_by_group
1  Allianz  Birikimli Hayat      1  2022      Ferdi      325              325
2  Allianz  Birikimli Hayat      2  2022      Ferdi      476              801
3      Axa  Birikimli Hayat      3  2022      Ferdi      687              687
Rabinzel
  • 7,757
  • 3
  • 10
  • 30
  • I want to pivot cumsum values over time so that I can plot. I have updated updated the question. Could you check again? – Mehmet Soydam Aug 27 '22 at 14:58
1

To calculate the cumulative sum over multiple colums of a dataframe, you can use pandas.DataFrame.groupby and pandas.DataFrame.cumsum combined.

Assuming that data is the dataframe that holds the original dataset, use the code below :

data['Premium'] = data.groupby(['Company', 'Year'])['Premium'].cumsum()
out = data[['Company', 'Month', 'Year', 'Premium']] #to select the specific columns

>>> print(out)

enter image description here

Timeless
  • 22,580
  • 4
  • 12
  • 30