0

By using the below code, I have the following output. But I need to create a plot from it (ggplot). My understanding is that I need to transform the DS to a DF.

Can someone help me on how to make my current dataset A,to look like a dataframe B as per below?

A) Current

    ds_perg1_2_merged = df_perg1_2.groupby(['DescricaoProblema'])['strRazaoSocial'].apply(lambda x: x.value_counts().head(3))


    DescricaoProblema                                
    Cobrança indevida.         CAIXA ECONOMICA FEDERAL                 66
                               CAIXA SEGUROS S.A                       45
                               BANCO BMG S.A.                          38
    Cobrança indevida/abusiva  CLARO S/A                               50
                               TIM CELULAR S/A                         47
                               COMPANHIA PIRATININGA DE FORÇA E LUZ    34
    Produto com vício          VIA VAREJO S/A                          46
                               SAMSUNG ELETRONICA DA AMAZONIA LTDA     27
                               WHIRLPOOL S/A                           23

ds_perg1_2_merged.info()
    <class 'pandas.core.series.Series'>
    MultiIndex: 9 entries, ('Cobrança indevida.', 'CAIXA ECONOMICA FEDERAL') to ('Produto com vício', 'ELECTROLUX DO BRASIL S/A')
    Series name: strRazaoSocial
    Non-Null Count  Dtype
    --------------  -----
    9 non-null      int64
    dtypes: int64(1)
    memory usage: 568.0+ bytes

B) Need to be:

DescricaoProblema          strRazaoSocial                      amount
Cobrança indevida.         CAIXA ECONOMICA FEDERAL                 66
                           CAIXA SEGUROS S.A                       45
                           BANCO BMG S.A.                          38
Cobrança indevida/abusiva  CLARO S/A                               50
                           TIM CELULAR S/A                         47
                           COMPANHIA PIRATININGA DE FORÇA E LUZ    34
Produto com vício          VIA VAREJO S/A                          46
                           SAMSUNG ELETRONICA DA AMAZONIA LTDA     27
                           WHIRLPOOL S/A                           23

EDIT: Ok, so I resolved half of the issue using ds_perg1_2_merged.to_frame() ...But for the third column of values I still need a separate column name. Not sure if I'm in the right path tho. enter image description here

songbird159
  • 39
  • 2
  • 6

2 Answers2

1

When grouping the DataFrame using more columns you get a MultiIndex.

You can use the reset_index method (see docs) to transform the MultiIndex into columns of a DataFrame.

For your example it would give something like:

> ds_perg1_2_merged.reset_index()

           DescricaoProblema                        strRazaoSocial    
0         Cobrança indevida.               CAIXA ECONOMICA FEDERAL  66
1         Cobrança indevida.                     CAIXA SEGUROS S.A  45
2         Cobrança indevida.                        BANCO BMG S.A.  38
3  Cobrança indevida/abusiva                             CLARO S/A  50
4  Cobrança indevida/abusiva                       TIM CELULAR S/A  47
5  Cobrança indevida/abusiva  COMPANHIA PIRATININGA DE FORÇA E LUZ  34
6          Produto com vício                        VIA VAREJO S/A  46
7          Produto com vício   SAMSUNG ELETRONICA DA AMAZONIA LTDA  27
8          Produto com vício                         WHIRLPOOL S/A  23
kev
  • 48
  • 8
  • Thank you! This is so very close. But it returned something messy with the column names: `DescricaoProblema level_1 strRazaoSocial` , When I would expect `DescricaoProblema strRazaoSocial level_1` . As is now... DescricaoProblema is OK, but level_1 contains the values from strRazaoSocial, and strRazaoSocial has the count values which would be from level_1 – songbird159 Jul 03 '22 at 00:40
  • Ok, I was able to tweak using `df2.rename(columns = {'level_1':'strRazaoSocial', 'strRazaoSocial':'Qtd'}, inplace = True)` :) Ty again! – songbird159 Jul 03 '22 at 00:49
0
ds_perg1_2_merged = df_perg1_2.groupby(['DescricaoProblema'], as_index=False)['strRazaoSocial'].apply(lambda x: x.value_counts().head(3))
BeRT2me
  • 12,699
  • 2
  • 13
  • 31