21

When I have a dataframe

df = DataFrame({'A': [5, 6, 3, 4], 'B': [1, 2, 3, 5]})
df

   A  B
0  5  1
1  6  2
2  3  3
3  4  5

I can use

df[df['A'].isin([3, 6])]

in order to select rows having the passed values.

Is there also a way to keep the order of the input list?

So that my output is not:

   A  B
1  6  2
2  3  3

but

   A  B
1  3  3
2  6  2
Zero
  • 74,117
  • 18
  • 147
  • 154
Nikita
  • 429
  • 2
  • 6
  • 14
  • 3
    Doing `df[...]` with boolean indexing keeps the order of the DataFrame, regardless of whether the `...` part involves `isin` or not. You would have to reorder your DataFrame separately, before or after applying `isin`. – BrenBarn May 01 '14 at 18:38
  • ok, isn't there a way to reorder the output by using the input list as rule? – Nikita May 01 '14 at 18:45
  • 1
    No, because `isin` is only for checking whether each item at a time "is in" the list at all, not *where* it is in the list. It doesn't pay attention to the list's order. Like I said, you would need to do the ordering in a separate step. – BrenBarn May 01 '14 at 19:15

7 Answers7

6

This question is a bit old, but I stumbled into having to do this. This is how I resolved the problem. I believe it's quite a generic and simple solution that hasn't been proposed yet here, and that actually doesn't use the isin() method:

df.set_index('A').loc[[3,6]].reset_index()

With the example provided:

>>> df = pd.DataFrame({'A': [5, 6, 3, 4], 'B': [1, 2, 3, 5]})
>>> df.set_index('A').loc[[3,6]].reset_index()
   A  B
0  3  3
1  6  2

Of course, this has the disadvantage that it loses the original index. To preserve the index you could also:

>>> df.reset_index().set_index('A').loc[[3,6]].reset_index().set_index('index')
       A  B
index      
2      3  3
1      6  2
Germán Sanchis
  • 629
  • 6
  • 6
4

This is a bit long, but it works. isin(), then sort_values() based on the list.

df = pandas.DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3,5]})
mylist = [3,6]
ndf =  df[df['A'].isin(mylist)]
ndf['sort_cat'] = pandas.Categorical(ndf['A'], categories=mylist, ordered=True)
ndf.sort_values('sort_cat', inplace=True)
ndf.reset_index(inplace=True)
print ndf
   A  B sort_cat
2  3  3        3
1  6  2        6

(I based this answer on sort pandas dataframe based on list)

Community
  • 1
  • 1
J. Rigby
  • 119
  • 7
4

Another option which filters and sorts in one shot

import pandas as pd
from functools import reduce
reduce(pd.DataFrame.append, map(lambda i: df[df.A == i], [3, 6]))
javigzz
  • 942
  • 2
  • 11
  • 20
3

This is the best solution I found:

 df.iloc[pd.Index(df.A).get_indexer([3,6])]

Result:

>>> df.iloc[pd.Index(df.A).get_indexer([3,6])]
   A  B
2  3  3
1  6  2

Credit: @cs95

lenhhoxung
  • 2,530
  • 2
  • 30
  • 61
2

You can make the input list a dataframe and use the merge function. I've found this particularly useful for large input lists where order matters.

For example:

df = pd.DataFrame({'A': [5, 6, 3, 4], 'B': [1, 2, 3, 5]})
input = pd.DataFrame({'input': [3, 6]})
output = input.merge(df, left_on='input', right_on='A').loc[:, ['A', 'B']]
print(output)

   A  B
0  3  3
1  6  2

There are 2 caveats. First, you have to specify which column of df you are searching for the match using the 'right_on' input to the merge function. Secondly, the indices of the resulting output dataframe are re-indexed.

M. Wang
  • 63
  • 6
0

isin is a set operation, and pandas aligns the input, so order of the input set is normally in the same order as the reference frame

You could if you REALLY want to do this:

In [15]: df.take(df['A'][df['A'].isin([3,6])].order().index)
Out[15]: 
   A  B
2  3  3
1  6  2

[2 rows x 2 columns]
Jeff
  • 125,376
  • 21
  • 220
  • 187
  • I ended up using [x for (y,x) in sorted(zip(this,list(df.B[df.A.isin([3,6])])))] giving me the wanted result. Unfortunately I could not achieve the same by your solution. – Nikita May 02 '14 at 21:37
  • Nikita's solution fails for me with NameError: name 'this' is not defined I found that Jeff's solution works for the example df, but not for a longer df. – J. Rigby Oct 13 '16 at 17:49
0

It is not the same but in my problem this solution provides me the data frame in the same order as the list in the "isin" function which is what I wanted. Take a look here

How to maintain order when selecting rows in pandas dataframe?

Perhaps it could help you.

Ariel
  • 151
  • 1
  • 9