2

Let's say we have dataframe like this

df = pd.DataFrame({ 
                "metric": ["1","2","1" ,"1","2"],
                "group1":["o", "x", "x" , "o", "x"],
                "group2":['a', 'b', 'a', 'a', 'b'] ,
                "value": range(5),
                "value2": np.array(range(5))* 2})

df

    metric  group1  group2  value   value2
0   1         o      a         0    0
1   2         x      b         1    2
2   1         x      a         2    4
3   1         o      a         3    6
4   2         x      b         4    8

then I want to have pivot format

df['g'] = df.groupby(['group1','group2'])['group2'].cumcount()
df1 = df.pivot(index=['g','metric'], columns=['group1','group2'], values=['value','value2']).sort_index(axis=1).rename_axis(columns={'g':None})


            value       value2
   group1   o   x       o   x
   group2   a   a   b   a   a   b
g  metric                       
0   1       0.0 2.0 NaN 0.0 4.0 NaN
    2       NaN NaN 1.0 NaN NaN 2.0
1   1       3.0 NaN NaN 6.0 NaN NaN
    2       NaN NaN 4.0 NaN NaN 8.0

From here we can see that ("value","o","b") and ("value2","o","b") not exist after making pivot

but I need to have those columns with values NA So I tried;

cols = [('value','x','a'), ('value','o','a'),('value','o','b')]

df1.assign(**{col : "NA" for col in np.setdiff1d(cols, df1.columns.values)})

which gives

enter image description here

Expected output

            value           value2
   group1   o       x       o       x
   group2   a   b   a   b   a   b   a   b
g  metric                       
0   1       0.0 NaN 2.0 NaN 0.0 NaN 4.0 NaN
    2       NaN NaN NaN 1.0 NaN NaN NaN 2.0
1   1       3.0 NaN NaN NaN 6.0 NaN NaN NaN
    2       NaN NaN NaN 4.0 NaN NaN NaN 8.0

one corner case with this is that if b does not exist how to create that column ?

           value     value2
   group1   o   x    o  x
   group2   a   a    a  a   
g  metric                       
0   1       0.0 2.0  0.0 4.0    
    2       NaN NaN  NaN NaN    
1   1       3.0 NaN  6.0 NaN    
    2       NaN NaN  NaN NaN    

Multiple insert columns if not exist pandas

Pandas: Check if column exists in df from a list of columns

Pandas - How to check if multi index column exists

Alexander
  • 4,527
  • 5
  • 51
  • 98

1 Answers1

2

Use DataFrame.stack with DataFrame.unstack:

df1 = df1.stack([1,2],dropna=False).unstack([2,3])
print (df1)
         value               value2              
group1       o        x           o        x     
group2       a   b    a    b      a   b    a    b
g metric                                         
0 1        0.0 NaN  2.0  NaN    0.0 NaN  4.0  NaN
  2        NaN NaN  NaN  1.0    NaN NaN  NaN  2.0
1 1        3.0 NaN  NaN  NaN    6.0 NaN  NaN  NaN
  2        NaN NaN  NaN  4.0    NaN NaN  NaN  8.0

Or with selecting last and last previous levels:

df1 = df1.stack([-2,-1],dropna=False).unstack([-2,-1])

Another idea:

df1 = df1.reindex(pd.MultiIndex.from_product(df1.columns.levels), axis=1)
print (df1)
         value               value2              
group1       o        x           o        x     
group2       a   b    a    b      a   b    a    b
g metric                                         
0 1        0.0 NaN  2.0  NaN    0.0 NaN  4.0  NaN
  2        NaN NaN  NaN  1.0    NaN NaN  NaN  2.0
1 1        3.0 NaN  NaN  NaN    6.0 NaN  NaN  NaN
  2        NaN NaN  NaN  4.0    NaN NaN  NaN  8.0

EDIT:

If need set new columns by list of tuples:

cols = [('value','x','a'), ('value','o','a'),('value','o','b')]

df = df1.reindex(pd.MultiIndex.from_tuples(cols).union(df1.columns), axis=1)
print (df)
         value               value2          
             o        x           o    x     
             a   b    a    b      a    a    b
g metric                                     
0 1        0.0 NaN  2.0  NaN    0.0  4.0  NaN
  2        NaN NaN  NaN  1.0    NaN  NaN  2.0
1 1        3.0 NaN  NaN  NaN    6.0  NaN  NaN
  2        NaN NaN  NaN  4.0    NaN  NaN  8.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks for quick post. I want to generalize the solution like using for loop. Assume we have 10-20 columns to check if the column exist or not in the pivot table. and if the column combination is not exist create that. – Alexander Sep 10 '21 at 05:43
  • one other corner case I having with the issue is that if`b` not exist at all in `group2` how can we create that sub column filled with `NA`s. – Alexander Sep 10 '21 at 05:50
  • 1
    @Alexander - So need last EDIT? Input is list of tuples for columns names? – jezrael Sep 10 '21 at 05:52
  • yes, please that's actually one of the issue I'm also facing with this type of dataset. The column not exist in the long format dataframe but I need to show it in the pivot format. – Alexander Sep 10 '21 at 05:54