I have this dataframe below:
df=pd.DataFrame({'cnpj':[410000132,410000132,4830624000197,4830624000197,4830624000197],'Nome Pessoa':['EUGENIO LUPORINI NETO','JUAN MATIAS SERAGOPIAN','EUGENIO LUPORINI NETO','SIMONE FANKHAUSER','ALEX SOUZA']})
print(df)
cnpj Nome Pessoa
0 410000132 EUGENIO LUPORINI NETO
1 410000132 JUAN MATIAS SERAGOPIAN
2 4830624000197 EUGENIO LUPORINI NETO
3 4830624000197 SIMONE FANKHAUSER
4 4830624000197 ALEX SOUZA
Each cnpj
is a company. Each Nome Pessoa
is a person. I want to list, for each Nome Pessoa
to which other persons appear with the same cnpj
as him (preferably with no duplicates). In other words, I will be listing how people are related using cnpj
as key, in a way that the df looks like this (or at least close to it):
cnpj Nome Pessoa Relations
0 410000132 EUGENIO LUPORINI NETO ['JUAN MATIAS SERAGOPIAN','SIMONE FANKHAUSER','ALEX SOUZA']
1 410000132 JUAN MATIAS SERAGOPIAN ['EUGENIO LUPORINI NETO']
2 4830624000197 EUGENIO LUPORINI NETO ['JUAN MATIAS SERAGOPIAN','SIMONE FANKHAUSER','ALEX SOUZA']
3 4830624000197 SIMONE FANKHAUSER ['EUGENIO LUPORINI NETO','ALEX SOUZA']
4 4830624000197 ALEX SOUZA ['EUGENIO LUPORINI NETO','SIMONE FANKHAUSER']
For instance, df['Relations'][0] = ['JUAN MATIAS SERAGOPIAN','SIMONE FANKHAUSER','ALEX SOUZA']
is like that because JUAN MATIAS SERAGOPIAN appears in the same cnpj as EUGENIO LUPORINI NETO (410000132) and SIMONE FANKHAUSER and ALEX SOUZA appear in the other cnpj together with EUGENIO (4830624000197)
I suppose it might be something in the groupby area however not sure how to achieve it.