0

Suppose I have the following table in redshift:

table
|   a |   b |
|----:|----:|
|   3 |   1 |
|   1 |   8 |
|   7 |   6 |
|   4 |   0 |
|   5 |   6 |
|   5 |   2 |
|   5 |   9 |
|   4 |   3 |
|   7 |   9 |
|   9 |   8 |

And in python, I have the following list of tuples:

x = [(3,1), (4,2), (10, 1), (7,9), (5,2), (6,1)]

I want to extract all rows from the table where the tuple (a,b) is in x using pd.read_sql_query`.

If I only had one column it would be a simple SQL WHERE clause, something like:

query = f'''
SELECT *
FROM table
WHERE a IN {x_sql} 
'''

pd.read_sql_query(query, engine)

My final result would be:

|   a |   b |
|----:|----:|
|   3 |   1 |
|   5 |   2 |
|   7 |   9 |

I wanted to create a query like:

#doesn't work
SELECT *
FROM table 
WHERE a,b IN ((3,1), (4,2), (10, 1), (7,9), (5,2), (6,1)) 
Bruno Mello
  • 4,448
  • 1
  • 9
  • 39

1 Answers1

1

We can use .stack with isin and .loc to filter along the index:

x = [(3,1), (4,2), (10, 1), (7,9), (5,2), (6,1)]

df.loc[df.stack().groupby(level=0).agg(tuple).isin(x)]


       a      b 
1      3      1
6      5      2
9      7      9
halfer
  • 19,824
  • 17
  • 99
  • 186
Umar.H
  • 22,559
  • 7
  • 39
  • 74
  • My table is in redshift and I didn't want to load it entirely using pandas – Bruno Mello Aug 17 '20 at 16:50
  • @BrunoMello right missed that, you could create a calculated column I think [this](https://stackoverflow.com/questions/8006901/using-tuples-in-sql-in-clause) question has the syntax you need. you could also create a calculated column and match as strings. Not sure how that is for performance . – Umar.H Aug 17 '20 at 16:53
  • I think the syntax of the accepted answer might work, thanks! – Bruno Mello Aug 17 '20 at 16:56