0

I have the next problem: I have a dataframe in pandas with an attribute 'features' and another attribute 'VOTES'. 'VOTES' is numeric, and 'features' is a string which is repeated in the dataframe. I want to group according to features and sum the values of VOTES, in order to get the next result:

Dataframe initially:

+----------+---------+
| features | VOTES   |
+----------+---------+
| A        | 4       |
+----------+---------+
| V        | 3       |
+----------+---------+
| A        | 2       |
+----------+---------+
| C        | 9       |
+----------+---------+

I did the following but I got NaN values on VOTES column.

dataframe_clusters['VOTES'] = dataframe_clusters.groupby('features')['VOTES'].sum()

I want to get the next result:

+----------+---------+
| features | VOTES   |
+----------+---------+
| A        | 6       |
+----------+---------+
| V        | 3       |
+----------+---------+
| C        | 9       |
+----------+---------+
jartymcfly
  • 1,945
  • 9
  • 30
  • 51

3 Answers3

1

You can do in this way:

dataframe_clusters.groupby('features').sum().reset_index()

Output:

  features  VOTES
0        A      6
1        C      9
2        V      3
Joe
  • 12,057
  • 5
  • 39
  • 55
0

You can add reset_index or parameter as_index=False, also for not sorting values of features is possible add parameter sort=False:

df = dataframe_clusters.groupby('features', sort=False)['VOTES'].sum().reset_index()

df = dataframe_clusters.groupby('features', as_index=False, sort=False)['VOTES'].sum()

print (df)
  features  VOTES
0        A      6
1        V      3
2        C      9

If want assign to new column is possible use GroupBy.transform for return Series of aggregated values with same size as original DataFrame:

dataframe_clusters['VOTES'] = dataframe_clusters.groupby('features')['VOTES'].transform('sum')
print (dataframe_clusters)

  features  VOTES
0        A      6
1        V      3
2        A      6
3        C      9
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

From your question is not really clear what you need in the end. The grouping you're doing is OK, but for some reason you're assigning it a column of the same dataFrame. I'm guessing that you need a join in the end. Check this:

import pandas as pd
df = pd.DataFrame(data={'features':['A','V','A','C'], 'VOTES':[4,3,2,9]})
totals = df.groupby('features').sum()
print(df)
print(totals)
joined = df.join(totals, on='features', rsuffix='_total')
print(joined)

It will give you this:

   VOTES features
0      4        A
1      3        V
2      2        A
3      9        C
          VOTES
features       
A             6
C             9
V             3
   VOTES features  VOTES_total
0      4        A            6
1      3        V            3
2      2        A            6
3      9        C            9
Alexander Goida
  • 305
  • 1
  • 11