I have a fairly large table with a few million rows. I am trying to write an efficient code that would select rows where two columns have values in the list of pairs passed to it from python code. A reasonable answer was posted.
E.g.
select *
from table
where convert(id1) + '-' + id2 in ('2261-7807403','2262-9807403' )
The returned table is put to dataframe via pd.read_sql_query
. I have two issues here (besides the one that it is slow). One, id2
can be NULL
and the query fails for these rows. Another, a more important issue, is that the size of the tuple list in the where
clause can vary wildly, from 1 to millions.
My understanding is that in the large list case it is better to import the whole columns into python with pandas and then filter them there. But how do I make this transition between small and large number of lists smooth? Is there a way to do it by some clever combination of SQL Server and Python?