1

I have the following dataframe:

df=pd.DataFrame({'code1':["A","B","A"],"code2":["k","l","k"],'Names':[['EUGENIO NETO','JUAN MATIAS SERAGOPIAN'],['EUGENIO LUPORINI NETO'],['SIMONE FANKHAUSER','ALEX SOUZA']]})

  code1 code2                                   Names
0     A     k  [EUGENIO NETO, JUAN MATIAS SERAGOPIAN]
1     B     l                 [EUGENIO LUPORINI NETO]
2     A     k         [SIMONE FANKHAUSER, ALEX SOUZA]

I want to groupby code1 and code2 and combine the lists in Names. In a way gets to look like this:

  code1 code2  Names
0     A     k  [EUGENIO NETO, JUAN MATIAS SERAGOPIAN, SIMONE FANKHAUSER, ALEX SOUZA]
1     B     l  [EUGENIO LUPORINI NETO]

Already checked the following answers:

Groupby and append lists and strings

pandas groupby and join lists

So I've tried to adapt the answers of those questions to my case (but didn't manage to solve):

df['Names']=df[['code1','code2',"Names"]].groupby(['code1','code2'])["Names"].agg('sum')
----> ValueError: Function does not reduce

df['Names']=df[['code1','code2',"Names"]].groupby(['code1','code2'])["Names"].agg('Names')
----> AttributeError: 'SeriesGroupBy' object has no attribute 'Names'

df['Names']=df[['code1','code2',"Names"]].groupby(['code1','code2'])["Names"].transform(lambda x: append(x))
----> NameError: name 'append' is not defined

Am I missing something or that's a wrong way?

EDIT

Andrej and NYC Coder suggested functional solutions indeed. But when I ran that that in a larger dataset I got the same ValueError: Function does not reduce. Researched on what could that be and found this question here: Pandas Groupby Agg Function Does Not Reduce

The elected answer suggests to use tuples since lists are problematic. Another answer explains where that happens in the pandas code. Tuples would be the best way? How to apply that here?

aabujamra
  • 4,494
  • 13
  • 51
  • 101

2 Answers2

1
print( df.groupby(['code1', 'code2'], as_index=False).agg('sum') )

Prints:

  code1 code2                                              Names
0     A     k  [EUGENIO NETO, JUAN MATIAS SERAGOPIAN, SIMONE ...
1     B     l                            [EUGENIO LUPORINI NETO]

EDIT: A solution with itertools.chain:

from itertools import chain

df=pd.DataFrame({'code1':["A","B","A"],"code2":["k","l","k"],'Names':[['EUGENIO NETO','JUAN MATIAS SERAGOPIAN'],['EUGENIO LUPORINI NETO'],['SIMONE FANKHAUSER','ALEX SOUZA']]})
print( df.groupby(['code1', 'code2'], as_index=False).agg(lambda x: list(chain.from_iterable(x))) )
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
1

This should do it:

df['Names'] = df['Names'].agg(lambda x: ','.join(map(str, x)))
df = df.groupby(by=['code1', 'code2'], as_index=False).agg('sum')
print(df)



  code1 code2                                              Names
0     A     k  EUGENIO NETO,JUAN MATIAS SERAGOPIANSIMONE FANK...
1     B     l                              EUGENIO LUPORINI NETO
NYC Coder
  • 7,424
  • 2
  • 11
  • 24
  • Thanks for that, works indeed, but when I run that in a larger dataset it takes me to ValueError: Function does not reduce. Just added an edit with an answer that talks about that – aabujamra Jun 18 '20 at 22:52
  • Thanks it works but the reason I wanted to join lists instead of strings is that I want to drop duplicates of the joined lists prior to turning it into strings... with this solution I can't do that unfortunately – aabujamra Jun 18 '20 at 23:03