python pandas - how to transform ds into dataframe

Question

By using the below code, I have the following output. But I need to create a plot from it (ggplot). My understanding is that I need to transform the DS to a DF.

Can someone help me on how to make my current dataset A,to look like a dataframe B as per below?

A) Current

    ds_perg1_2_merged = df_perg1_2.groupby(['DescricaoProblema'])['strRazaoSocial'].apply(lambda x: x.value_counts().head(3))


    DescricaoProblema                                
    Cobrança indevida.         CAIXA ECONOMICA FEDERAL                 66
                               CAIXA SEGUROS S.A                       45
                               BANCO BMG S.A.                          38
    Cobrança indevida/abusiva  CLARO S/A                               50
                               TIM CELULAR S/A                         47
                               COMPANHIA PIRATININGA DE FORÇA E LUZ    34
    Produto com vício          VIA VAREJO S/A                          46
                               SAMSUNG ELETRONICA DA AMAZONIA LTDA     27
                               WHIRLPOOL S/A                           23

ds_perg1_2_merged.info()
    <class 'pandas.core.series.Series'>
    MultiIndex: 9 entries, ('Cobrança indevida.', 'CAIXA ECONOMICA FEDERAL') to ('Produto com vício', 'ELECTROLUX DO BRASIL S/A')
    Series name: strRazaoSocial
    Non-Null Count  Dtype
    --------------  -----
    9 non-null      int64
    dtypes: int64(1)
    memory usage: 568.0+ bytes

B) Need to be:

DescricaoProblema          strRazaoSocial                      amount
Cobrança indevida.         CAIXA ECONOMICA FEDERAL                 66
                           CAIXA SEGUROS S.A                       45
                           BANCO BMG S.A.                          38
Cobrança indevida/abusiva  CLARO S/A                               50
                           TIM CELULAR S/A                         47
                           COMPANHIA PIRATININGA DE FORÇA E LUZ    34
Produto com vício          VIA VAREJO S/A                          46
                           SAMSUNG ELETRONICA DA AMAZONIA LTDA     27
                           WHIRLPOOL S/A                           23

EDIT: Ok, so I resolved half of the issue using ds_perg1_2_merged.to_frame() ...But for the third column of values I still need a separate column name. Not sure if I'm in the right path tho.

score 1 · Accepted Answer · answered Jul 02 '22 at 21:55

When grouping the DataFrame using more columns you get a MultiIndex.

You can use the reset_index method (see docs) to transform the MultiIndex into columns of a DataFrame.

For your example it would give something like:

> ds_perg1_2_merged.reset_index()

           DescricaoProblema                        strRazaoSocial    
0         Cobrança indevida.               CAIXA ECONOMICA FEDERAL  66
1         Cobrança indevida.                     CAIXA SEGUROS S.A  45
2         Cobrança indevida.                        BANCO BMG S.A.  38
3  Cobrança indevida/abusiva                             CLARO S/A  50
4  Cobrança indevida/abusiva                       TIM CELULAR S/A  47
5  Cobrança indevida/abusiva  COMPANHIA PIRATININGA DE FORÇA E LUZ  34
6          Produto com vício                        VIA VAREJO S/A  46
7          Produto com vício   SAMSUNG ELETRONICA DA AMAZONIA LTDA  27
8          Produto com vício                         WHIRLPOOL S/A  23

Thank you! This is so very close. But it returned something messy with the column names: `DescricaoProblema level_1 strRazaoSocial` , When I would expect `DescricaoProblema strRazaoSocial level_1` . As is now... DescricaoProblema is OK, but level_1 contains the values from strRazaoSocial, and strRazaoSocial has the count values which would be from level_1 — songbird159, Jul 03 '22 at 00:40
Ok, I was able to tweak using `df2.rename(columns = {'level_1':'strRazaoSocial', 'strRazaoSocial':'Qtd'}, inplace = True)` :) Ty again! — songbird159, Jul 03 '22 at 00:49

score 0 · Answer 2 · answered Jul 02 '22 at 21:41

0

ds_perg1_2_merged = df_perg1_2.groupby(['DescricaoProblema'], as_index=False)['strRazaoSocial'].apply(lambda x: x.value_counts().head(3))

answered Jul 02 '22 at 21:41

BeRT2me

12,699
2
13
31

python pandas - how to transform ds into dataframe

2 Answers2