PYTHON Pandas: sum with string in two dataframe

Question

I am trying to find out how to make this:

I want to create the 'total sum' in df2 with a sum of all the 'value' integers from df1 for each one of my list of string in the df2 column 'agence'.

My actual list of agence is 300+ so I am looking for something where I don't have to repeat manually for each agence name.

EDIT: Sorry I thougt of adding the picture to illustrate my question because I have difficulty phrasing it, but I should also have added the reproducible example dataframes below:

import pandas as pd
df1 = pd.DataFrame({'agence': ['a', 'a, b', 'c, d, a', 'd', 'e, a', 'b, d'], 
                   'value': [1, 4, 2, 5, 2, 10]})
df2 = pd.DataFrame({'agence': ['a', 'b', 'c', 'd', 'e',]})

Please take a tour of [how-to-make-good-reproducible-pandas-examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — Shubham Sharma, Jun 12 '20 at 11:30
I just corrected the post, sorry for the initial typical not reproducible pandas examples :) — Ad D, Jun 12 '20 at 11:41

CHRD · Accepted Answer · 2020-06-12T11:53:08.313

5

Use str.split() and explode() to expand the data. Then regroup using groupby() and sum() aggregation:

df2 = (df1.assign(agence=df1['agence'].str.split(', '))
       .explode('agence')
       .groupby('agence')
       .sum()
       .reset_index()
       .rename(columns=({'value': 'total sum'})))

df2
    agence  total sum
0   a       9
1   b       14
2   c       2
3   d       17
4   e       2

edited Jun 12 '20 at 11:53

answered Jun 12 '20 at 11:42

CHRD

1,917
1
15
38

1

No need to create an extra copy try this `df1.assign(agence=df1['agence'].str.split(', ')).explode('agence').groupby('agence').sum()` – Ch3steR Jun 12 '20 at 11:51
1

Deleted my answer as it was almost same as yours. Nice answer +1. – Ch3steR Jun 12 '20 at 11:54

score 1 · Answer 2 · answered Jun 12 '20 at 11:47

This is not the most efficient way, but if the DataFrame is not too big, I would simply iterate over each row, parse the 'agence' value and use a Counter object to compute the sum. I find this solution easy to read and understand for someone not super familiar with Pandas.

from collections import Counter

df1 = pd.DataFrame(
    {
        'agence':['a','a, b','c, d, a', 'd', 'e, a', 'b, d'],
        'value':[1,4,2,5,2,10]
    }
)

c = Counter()

for i,row in df1.iterrows():
    entries = row['agence'].split(', ')
    for e in entries:
        c.update({e:row['value']})


df2 = pd.DataFrame.from_dict(c, orient='index').reset_index()
df2.columns = ['agence', 'total_sum']
print(df2)

  agence  total_sum
0      a          9
1      b         14
2      c          2
3      d         17
4      e          2

PYTHON Pandas: sum with string in two dataframe

2 Answers2