2

I have a table that I want to query, but I want to make many specific queries and return a table of any results that have met their condition, and ignore queries that do not exist.

data = (
    (1, '2020-11-19'),
    (1, '2020-11-20'),
    (1, '2020-11-21'),
    (2, '2020-11-19'),
    (2, '2020-11-20'),
    (2, '2020-11-21')
)
        
string = """
    SELECT * FROM my_schema.my_table
    WHERE my_schema.my_table.song_id = %s
    AND my_schema.my_table.date = %s;
"""
        
execute_values(cursor, string, data)
results = cursor.fetchall()

Hopefully this illustrates what I'm trying to achieve here...

I want to perform a series of select statements which each have a pair of parameters. If that pair of parameters is in the database, then append it to the results table.

Is the only way to do this, manually in a for-loop?

klin
  • 112,967
  • 15
  • 204
  • 232
Alec Mather
  • 742
  • 5
  • 20
  • 1
    Does this answer your question? [Python/psycopg2 WHERE IN statement](https://stackoverflow.com/questions/28117576/python-psycopg2-where-in-statement) – Niloct Dec 22 '20 at 16:22
  • @Niloct it does not because in mine, I am pairing 2 very specific parameters together into a list of queries. If I use `WHERE IN` then I would just be returning anything that exists in those parameter tuples. – Alec Mather Dec 22 '20 at 16:25
  • @Alec: What's the question ? You want to `INSERT` the data in `results` into another database table ? – Maurice Meyer Dec 22 '20 at 17:21
  • @MauriceMeyer I want to `SELECT` all the rows of my table which meet the criteria of each of these queries. Imagine performing a lot of single `SELECT` statements, each of which has their own specific query, and returning all of the queries (that are successful) in a `results` table. – Alec Mather Dec 22 '20 at 18:13

2 Answers2

3

Executing many queries in a loop is not a good idea. Use a common table expression to deliver many pairs of parameters to a single query and get results for all of them, like in this Postgres example.

Python code:

data = (
    (1, '2020-11-19'),
    (1, '2020-11-20'),
    (1, '2020-11-21'),
    (2, '2020-11-19'),
    (2, '2020-11-20'),
    (2, '2020-11-21')
)
        
query = """
    with data(song_id, date) as (
        values %s
    )
    select t.*
    from my_table t
    join data d 
    on t.song_id = d.song_id and t.date = d.date::date
"""
execute_values(cursor, query, data)
results = cursor.fetchall()
klin
  • 112,967
  • 15
  • 204
  • 232
  • Thank you so much, this is exactly what I needed and answers my question perfectly. One additional question, does this answer scale if the table I'm searching is say.. 10 million rows? In other words, is there a better/different approach that factors in scalability, or is this kindof just the best we can do? Thanks again! – Alec Mather Dec 27 '20 at 17:53
  • 1
    This is the best solution regardless of the size of the table. In the case of a large table it can be helpful to create the appropriate index: `create index on my_table(song_id, date)` – klin Dec 27 '20 at 18:31
  • I'm having an issue with this solution, where the `cursor.fetchall()` only returns 5 items when I know that it should return hundreds/thousands of lines. If I pass `page_size=10` it only returns 2 items? I'm very confused... Thoughts? – Alec Mather Dec 28 '20 at 17:06
0

A simple and intuitive solution can be to use the "IN" clause with Tuple
i.e. (col1, col2) in ( (data11, data12), (data21, data22) ) e.g.

SELECT * FROM BOOKINGS WHERE (user_id,booked_at) in ((1, '2020-11-19'),(2, '2020-11-20') );"

Full code

import psycopg2
from psycopg2.extras import execute_values
import pandas as pd

# Ref - https://uibakery.io/sql-playground
connection = psycopg2.connect(host='psql-mock-database-cloud.postgres.database.azure.com', user='zvgyzkbybtzsnmvqkwzqmogy@psql-mock-database-cloud', password='tixzlbnnrjlbczfuzbmdwsxd', dbname='booking1665772869599ofknbwmmpsmnffue', port=5432) 

data = (
(125, '2021-11-18T08:08:59.839Z'),
(28, '2021-11-17T20:01:02.244Z'),
(78, '2021-11-17T15:57:27.186Z'))
    
string = "SELECT * FROM BOOKINGS WHERE (user_id,booked_at) in ( %s );"

with connection.cursor() as cursor:
  execute_values(cursor, string, data)
  results = cursor.fetchall()
  col_names = [desc[0] for desc in cursor.description]

df = pd.DataFrame(results, columns=col_names)
df
10xAI
  • 154
  • 1
  • 8
  • I actually had no idea that this was even a thing lol... that being said, having now dealt with this type of problem for a long time now, there are much more performant ways of solving this problem. – Alec Mather Oct 14 '22 at 20:49