2

I have a pd.DataFrame of the following form and numerical values in column 0 are not necessarily distinct:

>>> idx = pd.MultiIndex.from_arrays([["a", "a", "b", "b", "c", "c"], ["b", "c", "a", "c", "a", "b"]])
>>> df = pd.DataFrame(list(range(6)), index=idx)
     0
a b  0
  c  1
b a  2
  c  3
c a  4
  b  5

I would like to slice out first occurrences of unique combinations of the 2 index levels to get something like this:

     0
a b  0
  c  1
b c  3

Using pandas 0.23.4 and Python 3.6.5 in this case.

J.K.
  • 1,574
  • 1
  • 13
  • 21

3 Answers3

2

I think you can use Index.duplicated with index values converted to frozensets and then filter in boolean indexing:

df = df[~df.index.map(frozenset).duplicated()]
print (df)
     0
a b  0
  c  1
b c  3
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

You can work on the index values, and similar to this question

a = np.sort(df.index.to_list(), axis=1)
df.groupby([a[:,0], a[:,1]], sort=False).first()

Or similarly:

mask = pd.DataFrame(np.sort(df.index.to_list(), axis=1)).duplicated()
df[~mask.values]
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
0

Found an alternative solution by unstacking the frame and using numpy boolean indexing.

df_u = df.unstack()
df_u[~np.triu(np.ones(3), 1).astype(bool)] = np.nan
df_u = df_u.stack().dropna()
df_u
       0
a b  0.0
  c  1.0
b c  3.0

I will check all of the options to see which one performs faster to select the answer. Thank you everyone who posted!

J.K.
  • 1,574
  • 1
  • 13
  • 21