how to check and groupby all the objects starts with of a dataframe in column

Question

Have a dataframe where I need to check , group by and sum all the data

I have used regex function to find and group all the particular group of data starts with respective countries.

Suppose I have a dataset

Countries    31-12-17   1-1-18  2-1-18  3-1-18  Sum
India-Basic    1200      1100    800     900    4000
Sweden-Basic   1500      1300    700     1500   5000
Norway-Basic   800       400     900      900   3000
India-Exp      600       1400    300      200   2500
Sweden-Exp     1800      400     600     700    3500
Norway-Exp     1300      1600    1100    1500   4500

Expected Output :

Countries    Sum
India        6500
Sweden       8500
Norway       7500

India

Does this answer your question? [Pandas sum by groupby, but exclude certain columns](https://stackoverflow.com/questions/32751229/pandas-sum-by-groupby-but-exclude-certain-columns) — oreopot, Apr 13 '20 at 05:53

jezrael · Answer 1 · 2020-04-13T04:58:20.060

3

Use for regex solution Series.str.extract and aggregate sum:

df1 = (df.groupby(df['Countries'].str.extract('(.*)-', expand=False), sort=False)['Sum']
         .sum()
         .reset_index())
print (df1)
  Countries   Sum
0     India  6500
1    Sweden  8500
2    Norway  7500

Alternative si split Countries by - and select first lists by str[0]:

df1 = (df.groupby(df['Countries'].str.split('-').str[0], sort=False)['Sum']
        .sum()
        .reset_index())
print (df1)
  Countries   Sum
0     India  6500
1    Sweden  8500
2    Norway  7500

edited Apr 13 '20 at 04:58

answered Apr 13 '20 at 04:51

jezrael

822,522
95
1,334
1,252

2

Downvoter, if there's something wrong with my answer, please let me know, so I can correct it. Thanks. – jezrael Apr 13 '20 at 04:56
how to sum the data from 2-1-18 (date) column onwards . ? – Manz Apr 13 '20 at 05:54
@neeraj04 - Do you think how create last `Sum` column? It means there is no `Sum` column? Then use `df['Sum'] = df.loc[:, '2-1-18':].sum(axis=1)` – jezrael Apr 13 '20 at 05:57
df['Sum'] = df.loc[:, '2-1-18':].sum(axis=1) is not working , its giving unexceptional value – Manz Apr 13 '20 at 06:28
@neeraj04 - So need `df['Sum'] = df.loc[:, '2-1-18':].drop('Sum', axis=1).sum(axis=1)` ? Because `Sum` column already exist? – jezrael Apr 13 '20 at 06:31
I have created an separate column with the column name "sumval" and the row data contains some null values as well . – Manz Apr 13 '20 at 06:37
@neeraj04 - not sure if understand. What is `giving unexceptional value` ? sum use some more columns? Or some error? – jezrael Apr 13 '20 at 06:38
When I am finding the sum in the excel by manually selecting the required row its giving different value but when finding it with the above statement its giving different value – Manz Apr 13 '20 at 06:40
@neeraj04 - What is difference? there is count some column which has to be excluded? I have no real data, so is necessary be detective here ;) – jezrael Apr 13 '20 at 06:42

score 1 · Answer 2 · answered Apr 13 '20 at 04:53

this could work - note that i only filtered for the columns that are relevant :

(df.filter(['Countries','Sum'])
 .assign(Countries = lambda x: x.Countries.str.split('-').str.get(0))
 .groupby('Countries')
 .agg('sum')
)    


             Sum
Countries   
 India      6500
 Norway     7500
 Sweden     8500

how to check and groupby all the objects starts with of a dataframe in column

2 Answers2