0

I have a dataframe containing:

                State     Country      Date      Cases
0                NaN    Afghanistan 2020-01-22      0
271              NaN    Afghanistan 2020-01-23      0
...              ...            ...        ...    ...
85093            NaN       Zimbabwe 2020-11-30   9950
85364            NaN       Zimbabwe 2020-12-01  10129

I'm trying to create a new column of cumulative cases but grouped by Country AND State.

                State     Country      Date      Cases  Total Cases
231          California     USA     2020-01-22      5        5
342          California     USA     2020-01-23     10        15
233            Texas        USA     2020-01-22      4        4
322            Texas        USA     2020-01-23     12        16

I have been trying to follow Pandas groupby cumulative sum and have tried things such as:

df['Total'] = df.groupby(['State','Country'])['Cases'].cumsum()

Returns a series of -1's

df['Total'] = df.groupby(['State', 'Country']).sum() \
                .groupby(level=0).cumsum().reset_index()

Returns the sum.

df['Total'] = df.groupby(['Country'])['Cases'].apply(lambda x: x.cumsum())

Doesnt separate sums by state.

df_f['Total'] = df_f.groupby(['Region','State'])['Cases'].apply(lambda x: x.cumsum())

This one works exept when 'State' is NaN, 'Total' is also NaN.

Ravi
  • 2,778
  • 2
  • 20
  • 32
MiguelL
  • 79
  • 7

1 Answers1

0
arrays = [['California', 'California', 'Texas', 'Texas'],
          ['USA', 'USA', 'USA', 'USA'], 
          ['2020-01-22','2020-01-23','2020-01-22','2020-01-23'], [5,10,4,12]]
df = pd.DataFrame(list(zip(*arrays)), columns = ['State', 'Country', 'Date', 'Cases'])
df
    State       Country Date        Cases
0   California  USA     2020-01-22  5
1   California  USA     2020-01-23  10
2   Texas       USA     2020-01-22  4
3   Texas       USA     2020-01-23  12

temp = df.set_index(['State', 'Country','Date'], drop=True).sort_index( )
df['Total Cases'] = temp.groupby(['State', 'Country']).cumsum().reset_index()['Cases']
df
    State       Country Date        Cases   Total Cases
0   California  USA     2020-01-22  5       5
1   California  USA     2020-01-23  10      15
2   Texas       USA     2020-01-22  4       4
3   Texas       USA     2020-01-23  12      16
David Makovoz
  • 1,766
  • 2
  • 16
  • 27