1

I have this df:

nome_socio   cnpj_cpf_socio   municipio
Alexandre    AAA              Curitiba
Alexandre    AAA              Rio
Alexandre    AAA              Porto Alegre
Bruno        BBB              Porto Alegre
Bruno        BBB              Porto Alegre  

I want to get the mode for rows with the same nome_socio and cnpj_cpf_socio. For that I'm using the following code:

moda_municipio=df[['nome_socio','cnpj_cpf_socio','municipio']].groupby(['nome_socio','cnpj_cpf_socio'])['municipio'].apply(pd.Series.mode).to_frame().reset_index().rename(columns={'municipio':"cidade_pred"})

It does find the mode, however since for Alexandre + AAA rows there is a draw between the three municipios it returns three different rows. I'm getting this result:

  nome_socio cnpj_cpf_socio  level_2   cidade_pred
0  Alexandre            AAA        0      Curitiba
1  Alexandre            AAA        1  Porto Alegre
2  Alexandre            AAA        2           Rio
3      Bruno            BBB        0  Porto Alegre

I need to make it look like this:

  nome_socio cnpj_cpf_socio  level_2                      cidade_pred
   Alexandre            AAA        0      Curitiba, Porto Alegre, Rio
       Bruno            BBB        0                     Porto Alegre

Is there a way to do it?

aabujamra
  • 4,494
  • 13
  • 51
  • 101
  • 1
    Check [my answer](https://stackoverflow.com/a/54304691/4909087) (ctrl+F "multiple modes"). The solution is to change `.apply(pd.Series.mode)` to `.agg(pd.Series.mode)`. – cs95 May 30 '20 at 23:08
  • @cs95 thanks for that, your answer is indeed functional. I noticed you asked to close the question, however even though your answer in other question covers it, the questions are completely different. The downside of closing it is that you will prevent people with this same prob I'm having to find when googling, they will never find it. – aabujamra May 30 '20 at 23:24
  • What you just said sounds more like an argument _for_ duping, IMO. You graciously confirmed that my answer contains the context needed to solve your question. By marking your Q as a duplicate post, the hope is that other people coming to your question will find a link to the canonical post, and everyone wins. PS, closing a question does not make it less searchable on google. – cs95 May 30 '20 at 23:27

1 Answers1

2

We should do mode first then join the result

df.groupby(['nome_socio','cnpj_cpf_socio'])['cidade_pred'].agg(lambda x : ','.join(x.mode().tolist()))
BENY
  • 317,841
  • 20
  • 164
  • 234