1

Problem

I'm new to Python and am trying to pivot a table into the output I want.

Reproducible Data

I have this table and I am trying to pivot it into the expected output.

pd.DataFrame({'A': ['1.Food', '1.1Bread', '1.2Chicken', 'Car', 'Animal', 'Ball'], 'Val1': [10, 14, 94, 13, 49, 89], 'Val2': [1,2,3,4,5,6], 'Val3' : [100, 120, 130, 140, 150, 160]}, 
                  columns=['A', 'Val1', 'Val2', 'Val3'])


    A         Val1 Val2 Val3
0   1.Food      10  1   100
1   1.1Bread    14  2   120
2   1.2Chicken  94  3   130
3   Car         13  4   140
4   Animal      49  5   150
5   Ball        89  6   160

Expected Output

Output

Could someone please share with me how to pivot it properly? Thank you!

Update

I've used melt and it in a way creates what I want, but I am still unable to get my subcategory.

t1.melt(id_vars = ['A'])




      A     variable    value
0   1.Food  Val1         10
1   1.1Bread    Val1    14
2   1.2Chicken  Val1    94
3   Car     Val1        13
4   Animal  Val1        49
5   Ball    Val1        89
6   1.Food  Val2        1
7   1.1Bread    Val2    2
8   1.2Chicken  Val2    3
9   Car     Val2        4
10  Animal  Val2        5
11  Ball    Val2        6
12  1.Food  Val3        100
13  1.1Bread    Val3    120
14  1.2Chicken  Val3    130
15  Car     Val3        140
16  Animal  Val3        150
17  Ball    Val3        160
Javier
  • 730
  • 4
  • 17
  • Hi @Javier, I think you can find your answer here: https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe. – U3.1415926 Mar 17 '20 at 10:11
  • Not really, I realized I need to use melt instead of pivot. Can't seem to create the subcategory though... – Javier Mar 17 '20 at 10:31

1 Answers1

2

I could not find a truely direct way, so I did it step by step:

  • identify a numerical category from the number before a dot in column A
  • extract a SubCategory when column A contains 2 numbers separated with a dot
  • build a Category column
    • use column A if SubCategory is NaN
    • use a groupby with the numerical category to fill the Category column

When this a done, a simple stack gives the result. Code could be:

df = pd.DataFrame({'A': ['1.Food', '1.1Bread', '1.2Chicken', 'Car', 'Animal', 'Ball'], 'Val1': [10, 14, 94, 13, 49, 89], 'Val2': [1,2,3,4,5,6], 'Val3' : [100, 120, 130, 140, 150, 160]},
                  columns=['A', 'Val1', 'Val2', 'Val3'])

df['Categ'] = df['A'].str.extract(r'^(\d+)\.')
df['SubCategory'] = df['A'].str.extract(r'^(\d+\.\d+.*)')
df.loc[df['SubCategory'].isna(),'Category'] = df.loc[
    df['SubCategory'].isna(),'A']

df.loc[~ df['Categ'].isna(), 'Category'] = df[~ df['Categ'].isna()].groupby(
    'Categ')['Category'].apply(lambda x: x.bfill().ffill())

resul = df.set_index(['Category','SubCategory'])[
    ['Val1', 'Val2', 'Val3']].stack().reset_index().rename(
        columns={'level_2': 'ValueType', 0: 'Value'})

It gives as expected:

   Category SubCategory ValueType  Value
0    1.Food         NaN      Val1     10
1    1.Food         NaN      Val2      1
2    1.Food         NaN      Val3    100
3    1.Food    1.1Bread      Val1     14
4    1.Food    1.1Bread      Val2      2
5    1.Food    1.1Bread      Val3    120
6    1.Food  1.2Chicken      Val1     94
7    1.Food  1.2Chicken      Val2      3
8    1.Food  1.2Chicken      Val3    130
9       Car         NaN      Val1     13
10      Car         NaN      Val2      4
11      Car         NaN      Val3    140
12   Animal         NaN      Val1     49
13   Animal         NaN      Val2      5
14   Animal         NaN      Val3    150
15     Ball         NaN      Val1     89
16     Ball         NaN      Val2      6
17     Ball         NaN      Val3    160
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • Hey @Serge, I am facing this error: 'Series' object has no attribute 'isna.' But I solved it when I changed isna to isnull instead :) – Javier Mar 17 '20 at 21:12