I have a query as part of my process, that I execute hundreds of times in a loop.
Initially, Table A
contains all records (20 mil). Table B
contains 0 records.
The primary key in both tables is ID
.
The query I execute is:
select * from A where a.ID not in (select ID from b) limit 10000
##magic stuff in python
insert everything to table B, once again, .
Initially the query runs super fast, but after the Nth loop (100th+), the size of table B increases to the point where it takes a bit of time to perform the NOT IN
operation.
Does anyone have recommendations on how I can speed up the query?
So far, I've tweaked the default mysql bugger to be 1.5gbs (ids are pretty small INTs, so that should be enough).
Caveats:
1) One way to do this would be to remove * from table A
after I've processed them. However, I want to keep table A
in tact.
The only method I could think of is adding another column to table A
(which I'd index) called PROCESSED
, and then update that column with a second query once the records have been processed/posted. But I was hoping there was an easier solution.