3

I have a multi-index dataframe that looks like

uid tid text

abc x t1

bcd y t2

uid and tid are the indexes. I have a list of uids, and want to get the rows corresponding to the uids in that list, but keeping the 2nd level index values (tid). I want to do it without running any explicit loop. Is that possible?

Rakib
  • 7,435
  • 7
  • 29
  • 45

1 Answers1

3

Data:

L = ['abc', 'bcd']

print (df)
         text
uid  tid     
abc  x     t1
abc1 x     t1
bcd  y     t2

1.slicers

idx = pd.IndexSlice
df1 = df.loc[idx[L,:],:]

2.boolean indexing + mask with get_level_values + isin:

df1 = df[df.index.get_level_values(0).isin(L)]

3.query, docs:

df1 = df.query('@L in uid')
print (df1)
        text
uid tid     
abc x     t1
bcd y     t2
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • thanks @jezrael. With approach one, I get 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)' error. But 2 works. Which is the most efficient? I have a huge data frame. – Rakib Sep 30 '17 at 18:00
  • the fastest is sorting first by `df = df.sort_index()` and then use first method. `sorting` is explained ['In 97'](http://pandas.pydata.org/pandas-docs/stable/advanced.html#sorting-a-multiindex) – jezrael Sep 30 '17 at 18:02