I have a big partitioned table to update (some billions of records and about 95M rows to update). Another lookup table has about 10M IDs that identify which rows to update in the big table mentioned above.
The following query used to identify partitions in the big table that have records to update:
SELECT /*+ parallel(8) */
dict.subobject_name
FROM (SELECT DISTINCT dbms_rowid.rowid_object (trg.ROWID) data_object_id
FROM lookup_table lkp INNER JOIN big_table trg ON lkp.id = trg.id) subq
INNER JOIN all_objects dict ON subq.data_object_id = dict.data_object_id;
DBMS_ROWID utility uses embedded ROWID functionality to get the segment identifier. So, this segment contains rows to update after joining with the lookup table. Then it joined with the ALL_OBJECTS data dictionary to get the connected partitions of the big table. As a result, I get 3 from 30 partitions to operate for partition exchange, but not all the partitions.
On the big table it took about 14 min to execute. Small table with 290K records to update it took about 53 sec. But the point is that it will run on about a thousand of tables and be executed for each of them in the FOR LOOP.
Do not think that EXISTS or IN will change the situation dramatically, but will check.
Any thoughts about this query optimization?