Adding level to middle of DF in Pandas

Question

I would like to add a new level to my DF (so that then I can use pd.reindex to do something else). My DF basically has something like this:

df = pd.DataFrame({('A','a'): [-1,-1,0,10,12],
                   ('A','b'): [0,1,2,3,-1],
                   ('A','c'): [-1,1,0,10,12],
                   ('A','d'): [1,1,2,3,-1],
                   ('B','a'): [-20,-10,0,10,20],
                   ('B','b'): [-200,-100,0,-1,200],
                   ('B','c'): [-20,-10,0,10,20],
                   ('B','d'): [-200,-100,0,100,200]
})

##df
    A               B
    a   b   c   d   a   b     c     d
0   -1  0   1   1   -20 -200  -20   -200
1   -1  1   -1  1   -10 -100  -10   -100
2   0   2   0   2   0   0     0     0
3   10  3   10  3   10  -1    10    100
4   12  -1  12  -1  20  200   20    200

I want to assign new level keys L1 = a+b, and L2=c+d. How do I do this?

The desired output would be

##df
    A               B
    L1      L2      L1        L2
    a   b   c   d   a   b     c     d
0   -1  0   1   1   -20 -200  -20   -200
1   -1  1   -1  1   -10 -100  -10   -100
2   0   2   0   2   0   0     0     0
3   10  3   10  3   10  -1    10    100
4   12  -1  12  -1  20  200   20    200

Edit: the objective is to achieve something similar to what was asked in here. This means that some rows will have NAs for the same KEY, depending on the value of other columns. E.g. if I want to filter columns a and c by respectively testing whether columns b and d are negative:

##df
    A               B
    L1      L2      L1        L2
    a   b   c   d   a   b     c     d
0   -1  0   1   1   NA  NA    NA    NA
1   -1  1   -1  1   NA  NA    NA    NA
2   0   2   0   2   0   0     0     0
3   10  3   10  3   NA  NA    10    100
4   NA  NA  NA  NA  20  200   20    200

"so that then I can use pd.reindex to do something else": perhaps we can help you better if you state what is your final purpose. — hyamanieu, Feb 27 '17 at 13:35
Well, the final purpose is very similar to what I had asked in here (http://stackoverflow.com/questions/42246373/replacing-values-in-a-2nd-level-column-on-multiindex-df-in-pandas). The difference now is that I have to do the parsing of columns `a` and `b` (as I did in the previous post), and the same for `c` and `d`. — Sos, Feb 27 '17 at 13:40
I'm going to change the working example to reflect this comment — Sos, Feb 27 '17 at 13:40
@Wli I have added some information which I hope may help in assisting me :) — Sos, Feb 27 '17 at 13:47

jezrael · Accepted Answer · 2017-03-03T11:48:35.783

You need create new array with map and then assign:

d = {'a':'L1','b':'L1','c':'L2','d':'L2'}
a = df.columns.get_level_values(1).map(lambda x: d[x])
print (a)
['L1' 'L1' 'L2' 'L2' 'L1' 'L1' 'L2' 'L2']

df.columns = [df.columns.get_level_values(0),a,df.columns.get_level_values(1)]
#same as
df.columns = pd.MultiIndex.from_arrays([df.columns.get_level_values(0),
                                        df.columns.get_level_values(1).map(lambda x: d[x]),
                                        df.columns.get_level_values(1)])
print (df)
    A             B              
   L1     L2     L1       L2     
    a  b   c  d   a    b   c    d
0  -1  0  -1  1 -20 -200 -20 -200
1  -1  1   1  1 -10 -100 -10 -100
2   0  2   0  2   0    0   0    0
3  10  3  10  3  10   -1  10  100
4  12 -1  12 -1  20  200  20  200

Second output is really complicated, for me works:

#filter columns
idx = pd.IndexSlice
mask = df.loc[:, idx[:,:,['b','d']]] < 0
print (mask)
       A             B       
      L1     L2     L1     L2
       b      d      b      d
0  False  False   True   True
1  False  False   True   True
2  False  False  False  False
3  False  False   True  False
4   True   True  False  False

#create mask to columns a,c
mask1 = mask.reindex(columns=df.columns)
mask1 = mask1.groupby(level=[0,1], axis=1).apply(lambda x: x.bfill(axis=1))
print (mask1)
       A                           B                     
      L1            L2            L1            L2       
       a      b      c      d      a      b      c      d
0  False  False  False  False   True   True   True   True
1  False  False  False  False   True   True   True   True
2  False  False  False  False  False  False  False  False
3  False  False  False  False   True   True  False  False
4   True   True   True   True  False  False  False  False

print (df.mask(mask1))
      A                     B                    
     L1         L2         L1           L2       
      a    b     c    d     a      b     c      d
0  -1.0  0.0  -1.0  1.0   NaN    NaN   NaN    NaN
1  -1.0  1.0   1.0  1.0   NaN    NaN   NaN    NaN
2   0.0  2.0   0.0  2.0   0.0    0.0   0.0    0.0
3  10.0  3.0  10.0  3.0   NaN    NaN  10.0  100.0
4   NaN  NaN   NaN  NaN  20.0  200.0  20.0  200.0

Another solution with reindex and method='bfill', but is necessary double transpose (I think it is bug - works only with MultiIndex in index, not with MultiIndex in columns):

idx = pd.IndexSlice
mask = df.loc[:, idx[:,['b','d']]] < 0
print (mask)
       A             B       
       b      d      b      d
0  False  False   True   True
1  False  False   True   True
2  False  False  False  False
3  False  False   True  False
4   True   True  False  False

mask1 = mask.T.reindex(df.columns, method='bfill').T
print (mask1)
       A                           B                     
       a      b      c      d      a      b      c      d
0  False  False  False  False   True   True   True   True
1  False  False  False  False   True   True   True   True
2  False  False  False  False  False  False  False  False
3  False  False  False  False   True   True  False  False
4   True   True   True   True  False  False  False  False

print (df.mask(mask1))
      A                     B                    
      a    b     c    d     a      b     c      d
0  -1.0  0.0  -1.0  1.0   NaN    NaN   NaN    NaN
1  -1.0  1.0   1.0  1.0   NaN    NaN   NaN    NaN
2   0.0  2.0   0.0  2.0   0.0    0.0   0.0    0.0
3  10.0  3.0  10.0  3.0   NaN    NaN  10.0  100.0
4   NaN  NaN   NaN  NaN  20.0  200.0  20.0  200.0

I am getting the following error when setting `mask = df.loc[:, idx[:,:,['b','d']]] < 0`: `KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (3), lexsort depth (2)'`. Changing your `mask` to `mask = df.loc[:, idx[:,['b','d']]] < 0` solves the problem with your working example and yields the correct `print`. However, in my real example I also get a similar error `the index to be fully lexsorted tuple len (3), lexsort depth (1)'`, and I am not able to solve it and get the correct print — Sos, Feb 28 '17 at 08:55
I think you need `df = df.sort_index(axis=1)` first, check [docs](http://pandas.pydata.org/pandas-docs/stable/advanced.html#sorting-a-multiindex) — jezrael, Feb 28 '17 at 09:04
Thank you so much @jezrael, I learned a lot with this and your previous answer to my post! I am truly grateful! — Sos, Feb 28 '17 at 09:13
MultiIndexes are really hard, this question is not so nice because there is a hack - `groupby` with `bfill`, but i have no idea how do it better. — jezrael, Feb 28 '17 at 09:23

Adding level to middle of DF in Pandas

1 Answers1