1

Been searching the net, but I cannot find any resource on deleting duplicate multiindex column name.

Given a multiindex as below

      level1                    
      level2                    
           A   B   C   B   A   C
ONE       11  12  13  11  12  13
TWO       21  22  23  21  22  23
THREE     31  32  33  31  32  33

Drop duplicated B and C

Expected output

      level1                    
      level2                    
           A   B   C   A
ONE       11  12  13  11
TWO       21  22  23  21
THREE     31  32  33  31

Code

import pandas as pd
df = pd.DataFrame({'A': [11, 21, 31],
               'B': [12, 22, 32],
               'C': [13, 23, 33]},
              index=['ONE', 'TWO', 'THREE'])

df2 = pd.DataFrame({'B': [11, 21, 31],
               'A': [12, 22, 32],
               'C': [13, 23, 33]},
              index=['ONE', 'TWO', 'THREE'])

df.columns = pd.MultiIndex.from_product([['level1'],['level2'],df.columns ])
df2.columns = pd.MultiIndex.from_product([['level1'],['level2'],df2.columns ])
df=pd.concat([df,df2],axis=1)

-Drop by index not working

Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
mpx
  • 3,081
  • 2
  • 26
  • 56
  • the drop index function cant tell which to drop, since they have the same names. ``df.iloc[:, :-2]`` fixes it; I would however, caution against using same names for columns – sammywemmy Aug 07 '21 at 06:21
  • Thanks for the advice. This happen due to the fact several `df` been concatenated – mpx Aug 07 '21 at 06:23
  • But, why do you want the duplicated column names for `A`? I'm afraid, normal dropping is not going to work for your use case, you need to have some custom dropping mechanism. – ThePyGuy Aug 07 '21 at 06:25
  • This is a simple example. Please ignore the need of maintaining duplicated `A` @ThePyGuy – mpx Aug 07 '21 at 06:26
  • Hi @sammywemmy, unfortunately, your suggestion is not robust for not symmetry column orientation. I have some changes to the post to reflect this – mpx Aug 07 '21 at 06:38
  • 1
    `df.T.drop_duplicates().T`? – Anurag Dabas Aug 07 '21 at 06:45
  • Somehow this is what I need @AnuragDabas. But as per the expected output here, I will removed the `A`. But, something that I can tweak on for real use case – mpx Aug 07 '21 at 06:54
  • Why don't you just provide your real case example? – mozway Aug 07 '21 at 07:00
  • so you want to include duplicated A as well? – Anurag Dabas Aug 07 '21 at 07:01
  • @Anurag, As per OP, only remove duplicated 'B` and `C` – mpx Aug 07 '21 at 07:08
  • @mozway, I can tweak based on Anurag recommendation. The example given is the closes to real use case. – mpx Aug 07 '21 at 07:11
  • Well, maybe, but if you had provided a your real case example with the expected output, this would probably have saved the contributors quite some time... – mozway Aug 07 '21 at 07:57

3 Answers3

2

You can try:

mask=(df.T.duplicated() | (df.columns.get_level_values(2).isin(['A','D'])))

Finally:

df=df.loc[:, mask]
#OR
#df=df.T.loc[mask].T
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
  • 1
    It will be interesting to generalize this suggestion for larger set of column at the second level.Say for example,at the second level we have `A`,`B`,`C` ,`D`, and we want to remove both `B` and `C` and maintain both `A` and `D`. – mpx Aug 07 '21 at 07:42
  • @balandongiv In that case use `isin()` updated answer...kindly have a look **:)** – Anurag Dabas Aug 07 '21 at 07:44
  • 1
    Can't you directly do `df.loc[:, mask]`? +1 – mozway Aug 07 '21 at 07:58
  • 1
    @balandongiv kindly check the updated answer **:)** – Anurag Dabas Aug 07 '21 at 08:02
0

Adaptation to df.T.drop_duplicates().T by Anurag Dabas.

Select only unique column and its value

drop_col=['B','C']
drop_single=[df.loc [:, (slice ( None ), slice ( None ), DCOL)].T.drop_duplicates().T for DCOL in drop_col]

Drop the columns from the df

df=df.drop ( drop_col, axis=1, level=2 )

Combine everything to get the intended output

df=pd.concat([df,*drop_single],axis=1)

Complete solution

import pandas as pd
df = pd.DataFrame({'A': [11, 21, 31],
               'B': [12, 22, 32],
               'C': [13, 23, 33]},
              index=['ONE', 'TWO', 'THREE'])

df2 = pd.DataFrame({'B': [11, 21, 31],
               'A': [12, 22, 32],
               'C': [13, 23, 33]},
              index=['ONE', 'TWO', 'THREE'])

df.columns = pd.MultiIndex.from_product([['level1'],['level2'],df.columns ])
df2.columns = pd.MultiIndex.from_product([['level1'],['level2'],df2.columns ])
df=pd.concat([df,df2],axis=1)
drop_col=['B','C']
drop_single=[df.loc [:, (slice ( None ), slice ( None ), DCOL)].iloc [:, 1] for DCOL in drop_col]

df=df.drop ( drop_col, axis=1, level=2 )
df_unique=pd.concat(drop_single,axis=1)
df=pd.concat([df,df_unique],axis=1)
print(df)
mpx
  • 3,081
  • 2
  • 26
  • 56
  • But still, the suggestion by Anurag is much more compact https://stackoverflow.com/a/68690202/6446053 – mpx Aug 07 '21 at 07:45
-1

You can try this:

# Drop last 2 columns of dataframe
df.drop(columns=df.columns[-2:], 
        axis=1, 
        inplace=True)