How to drop duplicate multiindex column in Pandas

Question

Been searching the net, but I cannot find any resource on deleting duplicate multiindex column name.

Given a multiindex as below

      level1                    
      level2                    
           A   B   C   B   A   C
ONE       11  12  13  11  12  13
TWO       21  22  23  21  22  23
THREE     31  32  33  31  32  33

Drop duplicated B and C

Expected output

      level1                    
      level2                    
           A   B   C   A
ONE       11  12  13  11
TWO       21  22  23  21
THREE     31  32  33  31

Code

import pandas as pd
df = pd.DataFrame({'A': [11, 21, 31],
               'B': [12, 22, 32],
               'C': [13, 23, 33]},
              index=['ONE', 'TWO', 'THREE'])

df2 = pd.DataFrame({'B': [11, 21, 31],
               'A': [12, 22, 32],
               'C': [13, 23, 33]},
              index=['ONE', 'TWO', 'THREE'])

df.columns = pd.MultiIndex.from_product([['level1'],['level2'],df.columns ])
df2.columns = pd.MultiIndex.from_product([['level1'],['level2'],df2.columns ])
df=pd.concat([df,df2],axis=1)

-Drop by index not working

the drop index function cant tell which to drop, since they have the same names. ``df.iloc[:, :-2]`` fixes it; I would however, caution against using same names for columns — sammywemmy, Aug 07 '21 at 06:21
Thanks for the advice. This happen due to the fact several `df` been concatenated — mpx, Aug 07 '21 at 06:23
But, why do you want the duplicated column names for `A`? I'm afraid, normal dropping is not going to work for your use case, you need to have some custom dropping mechanism. — ThePyGuy, Aug 07 '21 at 06:25
This is a simple example. Please ignore the need of maintaining duplicated `A` @ThePyGuy — mpx, Aug 07 '21 at 06:26
Hi @sammywemmy, unfortunately, your suggestion is not robust for not symmetry column orientation. I have some changes to the post to reflect this — mpx, Aug 07 '21 at 06:38
Somehow this is what I need @AnuragDabas. But as per the expected output here, I will removed the `A`. But, something that I can tweak on for real use case — mpx, Aug 07 '21 at 06:54
@mozway, I can tweak based on Anurag recommendation. The example given is the closes to real use case. — mpx, Aug 07 '21 at 07:11
Well, maybe, but if you had provided a your real case example with the expected output, this would probably have saved the contributors quite some time... — mozway, Aug 07 '21 at 07:57

Anurag Dabas · Accepted Answer · 2021-08-07T07:59:21.883

2

You can try:

mask=(df.T.duplicated() | (df.columns.get_level_values(2).isin(['A','D'])))

Finally:

df=df.loc[:, mask]
#OR
#df=df.T.loc[mask].T

edited Aug 07 '21 at 07:59

answered Aug 07 '21 at 07:19

Anurag Dabas

23,866
9
21
41

1

It will be interesting to generalize this suggestion for larger set of column at the second level.Say for example,at the second level we have `A`,`B`,`C` ,`D`, and we want to remove both `B` and `C` and maintain both `A` and `D`. – mpx Aug 07 '21 at 07:42
@balandongiv In that case use `isin()` updated answer...kindly have a look **:)** – Anurag Dabas Aug 07 '21 at 07:44
1

Can't you directly do `df.loc[:, mask]`? +1 – mozway Aug 07 '21 at 07:58
1

@balandongiv kindly check the updated answer **:)** – Anurag Dabas Aug 07 '21 at 08:02

mpx · Answer 2 · 2021-08-07T08:18:18.220

Adaptation to df.T.drop_duplicates().T by Anurag Dabas.

Select only unique column and its value

drop_col=['B','C']
drop_single=[df.loc [:, (slice ( None ), slice ( None ), DCOL)].T.drop_duplicates().T for DCOL in drop_col]

Drop the columns from the df

df=df.drop ( drop_col, axis=1, level=2 )

Combine everything to get the intended output

df=pd.concat([df,*drop_single],axis=1)

Complete solution

import pandas as pd
df = pd.DataFrame({'A': [11, 21, 31],
               'B': [12, 22, 32],
               'C': [13, 23, 33]},
              index=['ONE', 'TWO', 'THREE'])

df2 = pd.DataFrame({'B': [11, 21, 31],
               'A': [12, 22, 32],
               'C': [13, 23, 33]},
              index=['ONE', 'TWO', 'THREE'])

df.columns = pd.MultiIndex.from_product([['level1'],['level2'],df.columns ])
df2.columns = pd.MultiIndex.from_product([['level1'],['level2'],df2.columns ])
df=pd.concat([df,df2],axis=1)
drop_col=['B','C']
drop_single=[df.loc [:, (slice ( None ), slice ( None ), DCOL)].iloc [:, 1] for DCOL in drop_col]

df=df.drop ( drop_col, axis=1, level=2 )
df_unique=pd.concat(drop_single,axis=1)
df=pd.concat([df,df_unique],axis=1)
print(df)

But still, the suggestion by Anurag is much more compact https://stackoverflow.com/a/68690202/6446053 — mpx, Aug 07 '21 at 07:45

score -1 · Answer 3 · answered Aug 07 '21 at 06:30

-1

You can try this:

# Drop last 2 columns of dataframe
df.drop(columns=df.columns[-2:], 
        axis=1, 
        inplace=True)

answered Aug 07 '21 at 06:30

Mohammed Hamdan

89
1
7

Thanks for the effort @Mohammad, but this suggestion is not robust for not symmetry column orientation – mpx Aug 07 '21 at 06:38
Im down voting this as this does not solve the issue – mpx Aug 08 '21 at 05:14

How to drop duplicate multiindex column in Pandas

3 Answers3