19

I want to select rows in a particular order given in a list. For example

This dataframe

a=[['car',1],['bike',3],['jewel',2],['tv',5],['phone',6]]

df=pd.DataFrame(a,columns=['items','quantity'])

>>> df
   items  quantity
0    car         1
1   bike         3
2  jewel         2
3     tv         5
4  phone         6

I want to get the rows with this order ['tv','car','phone'], that is, first row tv and then car and then phone. I tried this method but it doesn't maintain order

arr=['tv','car','phone']

df.loc[df['items'].isin(arr)]

   items  quantity
0    car         1
3     tv         5
4  phone         6
Eka
  • 14,170
  • 38
  • 128
  • 212

8 Answers8

17

Here's a non-intrusive solution using Index.get_indexer that doesn't involve setting the index:

df.iloc[pd.Index(df['items']).get_indexer(['tv','car','phone'])]

   items  quantity
3     tv         5
0    car         1
4  phone         6

Note that if this is going to become a frequent thing (by thing, I mean "indexing" with a list on a column), you're better off turning that column into an index. Bonus points if you sort it.

df2 = df.set_index('items')
df2.loc[['tv','car','phone']]  

       quantity
items          
tv            5
car           1
phone         6
cs95
  • 379,657
  • 97
  • 704
  • 746
7

IIUC Categorical

df=df.loc[df['items'].isin(arr)]
df.iloc[pd.Categorical(df['items'],categories=arr,ordered=True).argsort()]
Out[157]: 
   items  quantity
3     tv         5
0    car         1
4  phone         6

Or reindex :Notice only different is this will not save the pervious index and if the original index do matter , you should using Categorical (Mentioned by Andy L, if you have duplicate in items ,reindex will failed )

df.set_index('items').reindex(arr).reset_index()
Out[160]: 
   items  quantity
0     tv         5
1    car         1
2  phone         6

Or loop via the arr

pd.concat([df[df['items']==x] for x in arr])
Out[171]: 
   items  quantity
3     tv         5
0    car         1
4  phone         6
BENY
  • 317,841
  • 20
  • 164
  • 234
5

merge to the rescue:

(pd.DataFrame({'items':['tv','car','phone']})
   .merge(df, on='items')
)

Output:

   items  quantity
0     tv         5
1    car         1
2  phone         6
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
4

For all items to be chosen existing in input df, here's one with searchsorted and should be good on performance -

In [43]: sidx = df['items'].argsort()

In [44]: df.iloc[sidx[df['items'].searchsorted(['tv','car','phone'],sorter=sidx)]]
Out[44]: 
   items  quantity
3     tv         5
0    car         1
4  phone         6
Divakar
  • 218,885
  • 19
  • 262
  • 358
3

Here is another variety that uses .loc.

# Move items to the index, select, then reset.
df.set_index("items").loc[arr].reset_index()

Or another that doesn't change the index.

df.loc[df.reset_index().set_index("items").loc[arr]["index"]]
Chris Farr
  • 3,580
  • 1
  • 21
  • 24
  • Note that `.loc` does not preserve the order in multiindexed Series/DataFrames. This discrepancy with `.loc` between single and multiindexed dataframes is considered a [bug](https://github.com/pandas-dev/pandas/issues/22797) and the [solution](https://github.com/pandas-dev/pandas/pull/28933) is part of the milestones of Pandas 1.1.0. – Ivan De Paz Centeno Jul 16 '20 at 12:00
3

I would create a dictionary from arr and map it to items and dropna, sort_values

d = dict(zip(arr, range(len(arr))))

Out[684]: {'car': 1, 'phone': 2, 'tv': 0}

df.loc[df['items'].map(d).dropna().sort_values().index]

Out[693]:
   items  quantity
3     tv         5
0    car         1
4  phone         6
Andy L.
  • 24,909
  • 4
  • 17
  • 29
0

Why not:

>>> df.iloc[df.loc[df['items'].isin(arr), 'items'].apply(arr.index).sort_values().index]
   items  quantity
3     tv         5
0    car         1
4  phone         6
>>> 
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
0

Why not search for index, filter and re-order:

df['new_order'] = df['items'].apply(lambda x: arr.index(x) if x in arr else -1)

df_new = df[df['new_order']>=0].sort_values('new_order')


   items  quantity  new_order
3     tv         5          0
0    car         1          1
4  phone         6          2

Yaakov Bressler
  • 9,056
  • 2
  • 45
  • 69