2

I am trying to find out how to make this:

enter image description here

I want to create the 'total sum' in df2 with a sum of all the 'value' integers from df1 for each one of my list of string in the df2 column 'agence'.

My actual list of agence is 300+ so I am looking for something where I don't have to repeat manually for each agence name.

EDIT: Sorry I thougt of adding the picture to illustrate my question because I have difficulty phrasing it, but I should also have added the reproducible example dataframes below:

import pandas as pd
df1 = pd.DataFrame({'agence': ['a', 'a, b', 'c, d, a', 'd', 'e, a', 'b, d'], 
                   'value': [1, 4, 2, 5, 2, 10]})
df2 = pd.DataFrame({'agence': ['a', 'b', 'c', 'd', 'e',]})
Ad D
  • 189
  • 1
  • 9
  • 4
    Please take a tour of [how-to-make-good-reproducible-pandas-examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Shubham Sharma Jun 12 '20 at 11:30
  • 3
    I just corrected the post, sorry for the initial typical not reproducible pandas examples :) – Ad D Jun 12 '20 at 11:41

2 Answers2

5

Use str.split() and explode() to expand the data. Then regroup using groupby() and sum() aggregation:

df2 = (df1.assign(agence=df1['agence'].str.split(', '))
       .explode('agence')
       .groupby('agence')
       .sum()
       .reset_index()
       .rename(columns=({'value': 'total sum'})))

df2
    agence  total sum
0   a       9
1   b       14
2   c       2
3   d       17
4   e       2
CHRD
  • 1,917
  • 1
  • 15
  • 38
  • 1
    No need to create an extra copy try this `df1.assign(agence=df1['agence'].str.split(', ')).explode('agence').groupby('agence').sum()` – Ch3steR Jun 12 '20 at 11:51
  • 1
    Deleted my answer as it was almost same as yours. Nice answer +1. – Ch3steR Jun 12 '20 at 11:54
1

This is not the most efficient way, but if the DataFrame is not too big, I would simply iterate over each row, parse the 'agence' value and use a Counter object to compute the sum. I find this solution easy to read and understand for someone not super familiar with Pandas.

from collections import Counter

df1 = pd.DataFrame(
    {
        'agence':['a','a, b','c, d, a', 'd', 'e, a', 'b, d'],
        'value':[1,4,2,5,2,10]
    }
)

c = Counter()

for i,row in df1.iterrows():
    entries = row['agence'].split(', ')
    for e in entries:
        c.update({e:row['value']})


df2 = pd.DataFrame.from_dict(c, orient='index').reset_index()
df2.columns = ['agence', 'total_sum']
print(df2)

  agence  total_sum
0      a          9
1      b         14
2      c          2
3      d         17
4      e          2
alec_djinn
  • 10,104
  • 8
  • 46
  • 71