Could anyone please tell me which one of the following is more efficient? I have tens of millions of rows to process, and performance is critical.
In the second example, table0
is a temporary table, which seems to be much faster to create than table0 in the first example. (Why?) I couldn't use a temporary table in the first example because the variable row could not be declared before the table is created. (table0
holds all distinct rows of the original table, which is not shown in the code below.)
I guess it'd be a good idea to create hash indices for blah2
, blah3
, blah4
and blah5
of table1
in the second example, though it would then take longer to write onto the table.
FOR row IN SELECT * FROM table0
LOOP
IF NOT EXISTS (SELECT 1 FROM table1
WHERE blah2 = row.blah2 AND blah3 = row.blah3
AND blah4 = row.blah4 AND blah5 = row.blah5) THEN
INSERT INTO table2
(blah0, blah1, blah2, blah3, blah4, blah5)
VALUES (row.blah0, row.blah1, row.blah2, row.blah3, row.blah4, row.blah5);
END IF;
END LOOP;
INSERT INTO table2
(blah0, blah1, blah2, blah3, blah4, blah5)
SELECT blah0, blah1, blah2, blah3, blah4, blah5 FROM table0
WHERE NOT EXISTS
(SELECT 1 FROM table1
WHERE table1.blah2 = table0.blah2
AND table1.blah3 = table0.blah3
AND table1.blah4 = table0.blah4
AND talbe1.blah5 = table0.blah5);