2

I have a dataframe:

df = pd.DataFrame({'start': [50, 100, 50000, 50030, 100000],
            'end': [51, 101, 50001, 50031, 100001],
            'value': [1, 2, 3, 4, 5]},
           index=['id1', 'id2', 'id3', 'id4', 'id5'])


                 x       y          z
   id1           foo     bar        1
   id2           bar     me         2
   id3           you     bar        3
   id4           foo     you        4
   id5           bar     foo        5

And a list of permutations:

l = [(foo, bar), (bar, foo)]

I want to extract all rows which contain the permutation in column [x,y]:

(foo, bar) -> id1, foo, bar, 1
(bar, foo) -> id5, bar, foo, 5

How can I extract these rows dependent on two values?

jtlz2
  • 7,700
  • 9
  • 64
  • 114
honeymoon
  • 2,400
  • 5
  • 34
  • 43
  • 2
    Have you looked at `.isin()`? you can construct a column from `(x,y)` then check against your `l` https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isin.html – jtlz2 Apr 27 '22 at 08:47

2 Answers2

2

You can convert x,y columns to MultiIndex, so possible compare values by Index.isin with boolean indexing:

l = [('foo', 'bar'), ('bar', 'foo')]
df1 = df[df.set_index(['x','y']).index.isin(l)]
print (df1)
       x    y  z
id1  foo  bar  1
id5  bar  foo  5
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
2

Try this:

Following https://stackoverflow.com/a/16068497/1021819,

df['xy'] = list(zip(df.x, df.y))

That will give you a column of tuples (x,y).

Then use .isin() (see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isin.html)

isin=df.xy.isin(l)
display(df[isin])

Hey presto!

jtlz2
  • 7,700
  • 9
  • 64
  • 114