1

We have more than 20 million records in a table and we tried to update 3K records but it took more than 7 mins and it didn't complete, so terminated the query.

Sample query,

UPDATE TABLE_A
SET STATUS = 'PENDING'
WHERE ID IN (
  SELECT ID
  FROM TMP_TABLE_A_STATUS_FIX
);   /*took more than 7 mins and didn't complete even after that*/

We collected all the ids which needs to be updated in a temp table TMP_TABLE_A_STATUS_FIX(has only 3K records).

As the above query took too long, we updated individually like,

UPDATE TABLE_A SET STATUS = 'PENDING' WHERE ID = 1;
UPDATE TABLE_A SET STATUS = 'PENDING' WHERE ID = 2;
UPDATE TABLE_A SET STATUS = 'PENDING' WHERE ID = 3;
.
.
.
UPDATE TABLE_A SET STATUS = 'PENDING' WHERE ID = 2999;
UPDATE TABLE_A SET STATUS = 'PENDING' WHERE ID = 3000; /*updated all 3K recordds in 0.00 secs*/

I really do not understand what is wrong with the IN query.

Can someone explain what is wrong in the update which has INin where clause and why it didn't complete even after 7 mins?

Note - ID in TABLE_A is primary key and its indexed. STATUS in TABLE_A is also indexed. We removed the index for STATUS as we thought updating the index column may take time due to index re-organization but that didn't help.

whoami
  • 1,517
  • 1
  • 21
  • 29
  • Do you have index on ID column? without index tables with large number of records take too much time due to massive IO operatore to filter the effected records. – Adil Jul 22 '16 at 04:54
  • @Adil, yes we do have indexes, updated the question. – whoami Jul 22 '16 at 04:56
  • 2
    My guess is that on each `UPDATE` call, your query checks that the `ID` it is current examining matches an `ID` in the `TMP_TABLE_A_STATUS_FIX` table, which it does by examining every `ID` in the `TMP_TABLE_A_STATUS_FIX` table. Thats 3k * 3mil operations. – Tyler Jul 22 '16 at 04:57
  • TABLE_A.ID and TMP_TABLE_A_STATUS_FIX.ID both need indexed. – SIDU Jul 22 '16 at 04:58
  • alter table TMP_TABLE_A_STATUS_FIX add index (id) – SIDU Jul 22 '16 at 04:59
  • 1
    IN is an identified database takes too long to process mostly if we are working a huge data from our database, use other way like exists.. – Vijunav Vastivch Jul 22 '16 at 05:00
  • 2
    Similar issue was discussed here: http://stackoverflow.com/questions/5018284/mysql-in-queries-terribly-slow-with-subquery-but-fast-with-explicit-values – Rocherlee Jul 22 '16 at 05:02
  • @Tyler - you are right, now I understand the delay. – whoami Jul 22 '16 at 05:06
  • @SIDU - indexing temp table didn't fix, forgot to mention in the question. sorry. – whoami Jul 22 '16 at 05:07
  • @Rocherlee - thanks for the link, helped me to understand how bad the subqueries are in big data sets. – whoami Jul 22 '16 at 05:08

1 Answers1

6

Insetad of using an in clause try with an inner join

UPDATE TABLE_A
INNER JOIN TMP_TABLE_A_STATUS_FIX on TABLE_A.ID = TMP_TABLE_A_STATUS_FIX .ID
SET STATUS = 'PENDING';
ScaisEdge
  • 131,976
  • 10
  • 91
  • 107