0

Have a dataframe where I need to check , group by and sum all the data

I have used regex function to find and group all the particular group of data starts with respective countries.

Suppose I have a dataset

Countries    31-12-17   1-1-18  2-1-18  3-1-18  Sum
India-Basic    1200      1100    800     900    4000
Sweden-Basic   1500      1300    700     1500   5000
Norway-Basic   800       400     900      900   3000
India-Exp      600       1400    300      200   2500
Sweden-Exp     1800      400     600     700    3500
Norway-Exp     1300      1600    1100    1500   4500

Expected Output :

Countries    Sum
India        6500
Sweden       8500
Norway       7500

India

Manz
  • 593
  • 5
  • 23
  • Does this answer your question? [Pandas sum by groupby, but exclude certain columns](https://stackoverflow.com/questions/32751229/pandas-sum-by-groupby-but-exclude-certain-columns) – oreopot Apr 13 '20 at 05:53

2 Answers2

3

Use for regex solution Series.str.extract and aggregate sum:

df1 = (df.groupby(df['Countries'].str.extract('(.*)-', expand=False), sort=False)['Sum']
         .sum()
         .reset_index())
print (df1)
  Countries   Sum
0     India  6500
1    Sweden  8500
2    Norway  7500

Alternative si split Countries by - and select first lists by str[0]:

df1 = (df.groupby(df['Countries'].str.split('-').str[0], sort=False)['Sum']
        .sum()
        .reset_index())
print (df1)
  Countries   Sum
0     India  6500
1    Sweden  8500
2    Norway  7500
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 2
    Downvoter, if there's something wrong with my answer, please let me know, so I can correct it. Thanks. – jezrael Apr 13 '20 at 04:56
  • how to sum the data from 2-1-18 (date) column onwards . ? – Manz Apr 13 '20 at 05:54
  • @neeraj04 - Do you think how create last `Sum` column? It means there is no `Sum` column? Then use `df['Sum'] = df.loc[:, '2-1-18':].sum(axis=1)` – jezrael Apr 13 '20 at 05:57
  • df['Sum'] = df.loc[:, '2-1-18':].sum(axis=1) is not working , its giving unexceptional value – Manz Apr 13 '20 at 06:28
  • @neeraj04 - So need `df['Sum'] = df.loc[:, '2-1-18':].drop('Sum', axis=1).sum(axis=1)` ? Because `Sum` column already exist? – jezrael Apr 13 '20 at 06:31
  • I have created an separate column with the column name "sumval" and the row data contains some null values as well . – Manz Apr 13 '20 at 06:37
  • @neeraj04 - not sure if understand. What is `giving unexceptional value` ? sum use some more columns? Or some error? – jezrael Apr 13 '20 at 06:38
  • When I am finding the sum in the excel by manually selecting the required row its giving different value but when finding it with the above statement its giving different value – Manz Apr 13 '20 at 06:40
  • @neeraj04 - What is difference? there is count some column which has to be excluded? I have no real data, so is necessary be detective here ;) – jezrael Apr 13 '20 at 06:42
1

this could work - note that i only filtered for the columns that are relevant :

(df.filter(['Countries','Sum'])
 .assign(Countries = lambda x: x.Countries.str.split('-').str.get(0))
 .groupby('Countries')
 .agg('sum')
)    


             Sum
Countries   
 India      6500
 Norway     7500
 Sweden     8500
sammywemmy
  • 27,093
  • 4
  • 17
  • 31