2

I was looking into this post which almost solved my problem. However, in my case, I want to work based on the 2nd level of the df, but trying not to specify my 1st level column names explicitly.

Borrowing the original dataframe:

df = pd.DataFrame({('A','a'): [-1,-1,0,10,12],
                   ('A','b'): [0,1,2,3,-1],
                   ('B','a'): [-20,-10,0,10,20],
                   ('B','b'): [-200,-100,0,100,200]})

##df
    A   B
    a   b   a   b
0   -1  0   -20 -200
1   -1  1   -10 -100
2   0   2   0   0
3   10  3   10  100
4   12  -1  20  200

I want to assign NA to all columns a and b where b<0. I was selecting them based on: df.xs('b',axis=1,level=1)<b, but then I cannot actually perform the replace. However, I have varying 1st level names, so the indexing there cannot be made based on A and B explicitly, but possibly through df.columns.values?

The desired output would be

##df
    A   B
    a   b   a   b
0   -1  0   NA  NA
1   -1  1   NA  NA
2   0   2   0   0
3   10  3   10  100
4   NA  NA  20  200

I appreciate all tips, thank you in advance.

Community
  • 1
  • 1
Sos
  • 1,783
  • 2
  • 20
  • 46

1 Answers1

2

You can use DataFrame.mask with reindex for same index and column names as original DataFrame created by reindex:

mask = df.xs('b',axis=1,level=1) < 0
print (mask)
       A      B
0  False   True
1  False   True
2  False  False
3  False  False
4   True  False

print (mask.reindex(columns = df.columns, level=0))
       A             B       
       a      b      a      b
0  False  False   True   True
1  False  False   True   True
2  False  False  False  False
3  False  False  False  False
4   True   True  False  False

df = df.mask(mask.reindex(columns = df.columns, level=0))
print (df)
      A          B       
      a    b     a      b
0  -1.0  0.0   NaN    NaN
1  -1.0  1.0   NaN    NaN
2   0.0  2.0   0.0    0.0
3  10.0  3.0  10.0  100.0
4   NaN  NaN  20.0  200.0

Edit by OP: I had asked in comments how to consider multiple conditions (e.g. df.xs('b',axis=1,level=1) < 0 OR df.xs('b',axis=1,level=1) being an NA). @Jezrael kindly indicated that if I wanted to do this, I should consider

mask=(df.xs('b',axis=1,level=1) < 0 | df.xs('b',axis=1,level=1).isnull())
Sos
  • 1,783
  • 2
  • 20
  • 46
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252