0

I have a pandas dataframe as follows:

    df = pd.DataFrame({
'State':['am','am','am','am','am','am','fg','fg','fg','fg','fg','fg','fg'],
'PC':['A','A','A','B','B','B','C','C','C','A','A','A','A'],
'Party':['alpha','beta','delta','alpha','beta','gamma','alpha','beta','kappa','alpha','gamma','kappa','lambda'],
'Votes':[10,4,50,11,2,60,3,1,70,12,34,52,43]
})

I want to add a Total column, which will contain the sums of the votes for each PC. Note that the PC can have the same name (e.g. 'A' above in two different states 'am' and 'fg', so we want to sum them separately, since they are different pc).

In summary, this is the output I want:

    df_out = pd.DataFrame({
'State':['am','am','am','am','am','am','fg','fg','fg','fg','fg','fg','fg'],
'PC':['A','A','A','B','B','B','C','C','C','A','A','A','A'],
'Party':['alpha','beta','delta','alpha','beta','gamma','alpha','beta','kappa','alpha','gamma','kappa','lambda'],
'Votes':[10,4,50,11,2,60,3,1,70,12,34,52,43],
'Total':[64,64,64,73,73,73,74,74,74,141,141,141,141]
})

I was trying to use cumsum

    df['cumsum'] = df['Votes'].cumsum()

But this is producing a cumulative sum and I dont see a way to condition on the values in columns 'State' and columns 'pc'. How do I do that to get desired output?

Alhpa Delta
  • 3,385
  • 4
  • 16
  • 31

1 Answers1

3

You want to use groupby and transform to get the total per State and PC. Transform keeps the shape of your original dataframe:

df['Total'] = df.groupby(['State', 'PC']).Votes.transform('sum')

print(df)
   State PC   Party  Votes  Total
0     am  A   alpha     10     64
1     am  A    beta      4     64
2     am  A   delta     50     64
3     am  B   alpha     11     73
4     am  B    beta      2     73
5     am  B   gamma     60     73
6     fg  C   alpha      3     74
7     fg  C    beta      1     74
8     fg  C   kappa     70     74
9     fg  A   alpha     12    141
10    fg  A   gamma     34    141
11    fg  A   kappa     52    141
12    fg  A  lambda     43    141
Erfan
  • 40,971
  • 8
  • 66
  • 78
  • 1
    Just literally finished reading about this function on practical business python, nice one. +1 – Umar.H Apr 01 '19 at 00:31
  • Thanks! one more thing. How do I then process this data frame to retain the top 2 by Votes for each 'State' and 'pc' AND keep the Party 'beta' too? So basically, each State-PC combination will have two rows of top two parties by 'Vote' and if 'beta' is not among the top 2 then it will have three rows. – Alhpa Delta Apr 01 '19 at 00:36
  • Also, how does this work if the column 'Votes' is named 'Votes Polled'? – Alhpa Delta Apr 01 '19 at 00:43
  • Looks like original problem is solved. As for next problem, that would be a different question, consider posting it separately. And link the new question here in the comments. I will answer it. Please dont forget to mark an answer as the right one when you are helped. :) @AlhpaDelta – Erfan Apr 01 '19 at 00:47
  • I will do that. thanks for the suggestion! – Alhpa Delta Apr 01 '19 at 01:11
  • @Erfan, here it is: https://stackoverflow.com/questions/55447120/adding-additional-rows-to-pandas-dataframe-to-capture-residual-value-while-retai – Alhpa Delta Apr 01 '19 at 01:43