Removing symmetric pairs in DataFrame with MultiIndex

Question

I have a pd.DataFrame of the following form and numerical values in column 0 are not necessarily distinct:

>>> idx = pd.MultiIndex.from_arrays([["a", "a", "b", "b", "c", "c"], ["b", "c", "a", "c", "a", "b"]])
>>> df = pd.DataFrame(list(range(6)), index=idx)
     0
a b  0
  c  1
b a  2
  c  3
c a  4
  b  5

I would like to slice out first occurrences of unique combinations of the 2 index levels to get something like this:

Using pandas 0.23.4 and Python 3.6.5 in this case.

score 2 · Accepted Answer · answered Jan 25 '21 at 14:36

2

I think you can use Index.duplicated with index values converted to frozensets and then filter in boolean indexing:

df = df[~df.index.map(frozenset).duplicated()]
print (df)
     0
a b  0
  c  1
b c  3

answered Jan 25 '21 at 14:36

jezrael

822,522
95
1,334
1,252

score 1 · Answer 2 · answered Jan 25 '21 at 14:38

1

You can work on the index values, and similar to this question

a = np.sort(df.index.to_list(), axis=1)
df.groupby([a[:,0], a[:,1]], sort=False).first()

Or similarly:

mask = pd.DataFrame(np.sort(df.index.to_list(), axis=1)).duplicated()
df[~mask.values]

answered Jan 25 '21 at 14:38

Quang Hoang

146,074
10
56
74

score 0 · Answer 3 · answered Jan 25 '21 at 15:06

Found an alternative solution by unstacking the frame and using numpy boolean indexing.

df_u = df.unstack()
df_u[~np.triu(np.ones(3), 1).astype(bool)] = np.nan
df_u = df_u.stack().dropna()
df_u
       0
a b  0.0
  c  1.0
b c  3.0

I will check all of the options to see which one performs faster to select the answer. Thank you everyone who posted!

Removing symmetric pairs in DataFrame with MultiIndex

3 Answers3