How to append lists using groupby? ValueError: Function does not reduce

Question

I have the following dataframe:

df=pd.DataFrame({'code1':["A","B","A"],"code2":["k","l","k"],'Names':[['EUGENIO NETO','JUAN MATIAS SERAGOPIAN'],['EUGENIO LUPORINI NETO'],['SIMONE FANKHAUSER','ALEX SOUZA']]})

  code1 code2                                   Names
0     A     k  [EUGENIO NETO, JUAN MATIAS SERAGOPIAN]
1     B     l                 [EUGENIO LUPORINI NETO]
2     A     k         [SIMONE FANKHAUSER, ALEX SOUZA]

I want to groupby code1 and code2 and combine the lists in Names. In a way gets to look like this:

  code1 code2  Names
0     A     k  [EUGENIO NETO, JUAN MATIAS SERAGOPIAN, SIMONE FANKHAUSER, ALEX SOUZA]
1     B     l  [EUGENIO LUPORINI NETO]

Already checked the following answers:

Groupby and append lists and strings

pandas groupby and join lists

So I've tried to adapt the answers of those questions to my case (but didn't manage to solve):

df['Names']=df[['code1','code2',"Names"]].groupby(['code1','code2'])["Names"].agg('sum')
----> ValueError: Function does not reduce

df['Names']=df[['code1','code2',"Names"]].groupby(['code1','code2'])["Names"].agg('Names')
----> AttributeError: 'SeriesGroupBy' object has no attribute 'Names'

df['Names']=df[['code1','code2',"Names"]].groupby(['code1','code2'])["Names"].transform(lambda x: append(x))
----> NameError: name 'append' is not defined

Am I missing something or that's a wrong way?

EDIT

Andrej and NYC Coder suggested functional solutions indeed. But when I ran that that in a larger dataset I got the same ValueError: Function does not reduce. Researched on what could that be and found this question here: Pandas Groupby Agg Function Does Not Reduce

The elected answer suggests to use tuples since lists are problematic. Another answer explains where that happens in the pandas code. Tuples would be the best way? How to apply that here?

What version of pandas are you using? – Scott Boston Jun 18 '20 at 22:40 — Scott Boston, Jun 18 '20 at 22:40

Andrej Kesely · Accepted Answer · 2020-06-18T23:04:33.883

1

print( df.groupby(['code1', 'code2'], as_index=False).agg('sum') )

Prints:

  code1 code2                                              Names
0     A     k  [EUGENIO NETO, JUAN MATIAS SERAGOPIAN, SIMONE ...
1     B     l                            [EUGENIO LUPORINI NETO]

EDIT: A solution with itertools.chain:

from itertools import chain

df=pd.DataFrame({'code1':["A","B","A"],"code2":["k","l","k"],'Names':[['EUGENIO NETO','JUAN MATIAS SERAGOPIAN'],['EUGENIO LUPORINI NETO'],['SIMONE FANKHAUSER','ALEX SOUZA']]})
print( df.groupby(['code1', 'code2'], as_index=False).agg(lambda x: list(chain.from_iterable(x))) )

edited Jun 18 '20 at 23:04

answered Jun 18 '20 at 22:40

Andrej Kesely

168,389
15
48
91

That is correct, but when I run it in a larger dataset I get: ValueError: Function does not reduce. Any idea what it could be? – aabujamra Jun 18 '20 at 22:48
Just added an edit with an answer that talks about that – aabujamra Jun 18 '20 at 22:52
1

@abutremutante Can you try `df.groupby(['code1', 'code2'], as_index=False).agg(lambda x: sum(x, []))` ? or `df.groupby(['code1', 'code2'], as_index=False).agg(lambda x: list(itertools.chain.from_iterable(x)))` – Andrej Kesely Jun 18 '20 at 23:00
1

Sweet. Works perfectly tks – aabujamra Jun 18 '20 at 23:07

NYC Coder · Answer 2 · 2020-06-18T22:59:04.603

1

This should do it:

df['Names'] = df['Names'].agg(lambda x: ','.join(map(str, x)))
df = df.groupby(by=['code1', 'code2'], as_index=False).agg('sum')
print(df)



  code1 code2                                              Names
0     A     k  EUGENIO NETO,JUAN MATIAS SERAGOPIANSIMONE FANK...
1     B     l                              EUGENIO LUPORINI NETO

edited Jun 18 '20 at 22:59

answered Jun 18 '20 at 22:40

NYC Coder

7,424
2
11
24

Thanks for that, works indeed, but when I run that in a larger dataset it takes me to ValueError: Function does not reduce. Just added an edit with an answer that talks about that – aabujamra Jun 18 '20 at 22:52
Thanks it works but the reason I wanted to join lists instead of strings is that I want to drop duplicates of the joined lists prior to turning it into strings... with this solution I can't do that unfortunately – aabujamra Jun 18 '20 at 23:03

How to append lists using groupby? ValueError: Function does not reduce

2 Answers2