I'm using python and SQLAlchemy. The DB is Microsoft's SQL Server.
I have two queries, both rather simple and each involve around half a million of rows:
- subquery: gets a list of IDs from a table. This IDs are meant to be ignored by the other query.
- query: gets rows from a second table. It has several wheres, but the only relevant one is a not_in clause:
where(table.c.column.not_in(subquery))
The queries are simple SQLAlchemy Select constructs like so:
query.Select(table.c.column).where(table.c.column == 'placeholder')
Both of them work well when executed separately (and, for the main one, when the not_in() clause is removed), and finish in around 30 seconds.
Here is the problem: when the subquery is run as part of the where clauses of the main query, the app freezes and never finishes executing. I have left it running for over an hour, and it never finished.
It never throws an error, and it never crashes. It simply hangs, forever.
I also tried excracting the results of the subquery into a list, then passing said list to the not_in() function. When I did that, the app crashed. As of yet, I haven't had any luck elucidating why.
I tried reducing the amount of rows returned by the subquery to single one, and it seemed to work. So, the size of the results might be the problem. Is there a more efficient way of handling this operation?