Pandas dropna with multilevel column index

Question

So I have got this Pandas DataFrame with multilevel index for the columns:

   group1    group2    group3
   1    2    1    2    1    2
0  ...  ...  NaN  ...  ...  ...
1  NaN  ...  ...  ...  ...  ...
2  ...  ...  ...  ...  NaN  ...

Now i want to drop the rows where the columns group2 and group3 have NaN values. Which equates to rows 0 and 2 in this instance.

According to my understanding of the documentation this should work:

df.dropna(axis = 'rows', subset = ['group2', 'group3'])

But it does not. Instead I get the error:

KeyError: ['group2', 'group3']

Could someone please point out to me how to properly specify the subset?

Kind regards, Rasmus

Update

So it seems like .dropna() cannot work with mulitlevel column indexes. In the end I went with the less elegant, but workable method suggested, slightly rewritten:

mask_nan = df[['group2', 'group3']].isna().any(axis = 'columns')
df[~mask_nan]    # ~ to negate / flip the boolean values

Well, `df.dropna(subset = ['column_name'])` should do the job. Can you try dropping one at a time? — Jonas Palačionis, Feb 05 '20 at 14:32

BENY · Accepted Answer · 2020-02-05T15:16:26.443

3

Seems like we can not pass the index level in dropna , so we could do

df.loc[:,['group2', 'group3']].isna().any(1)

Then

df=df[df.loc[:,['group2', 'group3']].isna().any(1)]

edited Feb 05 '20 at 15:16

answered Feb 05 '20 at 14:33

BENY

317,841
20
164
234

1

I don't think you need `loc` you can just use `df[['group2', 'group3']]` – anishtain4 Feb 05 '20 at 14:39

score 2 · Answer 2 · answered Feb 05 '20 at 14:31

2

I think this is a similiar question to yours.

import numpy as np

df = df[np.isfinite(df['group2', 'group3'])]

Only the rows where the values are finite are taken into account here.

answered Feb 05 '20 at 14:31

christheliz

176
2
15

Since I pass a total of four columns (two column groups of two a piece) would I not get a boolean array? I would need a Pandas Series, to pass back as an index mask. – user2194172 Feb 05 '20 at 14:56

score 1 · Answer 3 · answered Feb 05 '20 at 14:45

Start from detail. When you run:

idx = pd.IndexSlice
df.loc[:, idx['group2':'group3']]

You will get columns for group2 and group3:

  group2     group3    
       1   2      1   2
0    NaN   3    4.0   5
1    8.0   9   10.0  11
2   14.0  15    NaN  17

Now a more compicated expession:

df.loc[:, idx['group2':'group3']].notnull().all(axis=1)

will display a boolean Series with True where all columns are not null:

0    False
1     True
2    False
dtype: bool

So the code that you need is to use the above code in boolean indexing:

df[df.loc[:, idx['group2':'group3']].notnull().all(axis=1)]

(+ idx = pd.IndexSlice before).

Pandas dropna with multilevel column index

3 Answers3