Update statement taking more than 24 hours for updating 32k rows

Question

Below is the update statement that I am running for 32k times and it is taking more than 15 hours and running. I have to update the value in table 2 for 32k different M_DISPLAY VALUES.

UPDATE TABLE_2 T2  SET  T2.M_VALUE = 'COL_ANC' 
WHERE EXISTS (SELECT 1 FROM TABLE_2 T1  WHERE  TRIM(T1.M_DISPLAY) = 'ANCHORTST' AND T1.M_LABEL=T2.M_LABEL );

Am not sure Why is it taking such a long time as I have tuned the query,

I have copied 32000 update statements in a Update.sql file and running the SQL in command line. Though it is updating the table, it is a neverending process

Please advice if I have gone wrong anywhere

Regards

You need to add some more details, A) where the m_display_values come from b) why must this be 32k separate updates and not just 1 update . c) Explain plan / Schema information d) It also sounds like the 32k statements are running as a single transaction. Is that intended — Andrew, May 15 '18 at 08:27
Where do you get the values `COL_ANC` and `ANCHORTST` from? Is it another table (or data structure) that contains the entire update instructions? — Lukas Eder, May 15 '18 at 08:50
The first thing you should examine is the execution plan of your query - see [here] some advise how to get and publish it. Second - if you have not an unlimited time - you should follow **neither** a 32K upadte script **nor** a *for loop*, but a one statement `UPDATE`using a temporary table containing the update values. — Marmite Bomber, May 15 '18 at 09:28
m_display values vs M_VALUE are a part of excel sheet .There are 32k M_display values , but just 2 m_values (COL_ANC,COL_PNC) . In a single update i cant input 32k M_Display values. 32k lines are a part of file , whihc would run one by one ..Hope this clarifies — LearningCpp, May 15 '18 at 09:35

score 2 · Accepted Answer · answered May 15 '18 at 09:08

Using FORALL

If you cannot rewrite the query to run a single bulk-update instead of 32k individual updates, you might still get lucky by using PL/SQL's FORALL. An example:

DECLARE
  TYPE rec_t IS RECORD (
    m_value   table_2.m_value%TYPE,
    m_display table_2.m_display%TYPE
  );

  TYPE tab_t IS TABLE OF rec_t;

  data tab_t := tab_t();
BEGIN

  -- Fill in data object. Replace this by whatever your logic for matching
  -- m_value to m_display is
  data.extend(1);
  data(1).m_value := 'COL_ANC';
  data(1).m_display := 'ANCHORTST';

  -- Then, run the 32k updates using FORALL
  FORALL i IN 1 .. data.COUNT
    UPDATE table_2 t2
    SET t2.m_value = data(i).m_value
    WHERE EXISTS (
      SELECT 1
      FROM table_2 t1
      WHERE trim(t1.m_display) = data(i).m_display
      AND t1.m_label = t2.m_label
    );
END;
/

Concurrency

If you're not the only process on the system, 32k updates in a single transaction can hurt. It's definitely worth committing a few thousand rows in sub-transactions to reduce concurrency effects with other processes that might read the same table while you're updating.

Bulk update

Really, the goal of any improvement should be bulk updating the entire data set in one go (or perhaps split in a few bulks, see concurrency).

If you had a staging table containing the update instructions:

CREATE TABLE update_instructions (
  m_value VARCHAR2(..),
  m_display VARCHAR2(..)
);

Then you could pull off something along the lines of:

MERGE INTO table_2 t2
USING (
  SELECT u.*, t1.m_label
  FROM update_instructions u
  JOIN table_2 t1 ON trim(t1.m_display) = u.m_display
) t1
ON t2.m_label = t1.m_label
WHEN MATCHED THEN UPDATE SET t2.m_value = t1.m_value;

This should be even faster than FORALL (but might have more concurrency implications).

Indexing and data sanitisation

Of course, one thing that might definitely hurt you when running 32k individual update statements is the TRIM() function, which prevents using an index on M_DISPLAY efficiently. If you could sanitise your data so it doesn't need trimming first, that would definitely help. Otherwise, you could add a function based index just for the update (and then drop it again):

CREATE INDEX i ON table_2 (trim (m_display));

i dint really get staging concept, could you please elaborate , because update set should be done using update_instructions table and not T2.M_VALUE=T1.M_VALUE , thats just for comarision — LearningCpp, May 15 '18 at 09:40
I understand (from the comments above) that those values originate from an Excel sheet. Ideally, you would insert that data from Excel into Oracle into a table like my `update_instructions`, and then proceed with either the `FORALL` or the `MERGE` approach. — Lukas Eder, May 15 '18 at 09:56
Sir, I have created a function index on trim(m_display) and it has just taken 30 seconds to update all 32 k rows, Do u want me to beleive it , i also checked the values in DB also . Yesterday it ran about 24 hours . Am confused now :) — LearningCpp, May 15 '18 at 10:42
Why are you confused? It does seem like an important improvement, given that instead of scanning the entire table for `'ANCHORTST'` (how big is it anyway), you're now only searching a few rows in the index, per update execution... — Lukas Eder, May 15 '18 at 10:53

Maksym · Answer 2 · 2018-05-15T10:35:15.693

0

The query and subquery query the same table: TABLE_2. Assuming that M_LABEL is unique, the subquery returns 1s for all rows in TABLE_2 where M_DISPLAY is ANCHORTST. Then the update query updates the same (!) TABLE_2 for all 1s returned from subquery - so for all rows where M_DISPLAY is ANCHORTST.

Therefore, the query could be simplified, exploiting the fact that both update and select work on the same table - TABLE_2:

UPDATE TABLE_2 T2  SET  T2.M_VALUE = 'COL_ANC' WHERE TRIM(T2.M_DISPLAY) = 'ANCHORTST'

If M_LABEL is not unique, then the above is not going to work - thanks to commentators for pointing that out!

edited May 15 '18 at 10:35

answered May 15 '18 at 08:34

Maksym

1
2

1

That query does not perform the same logic, it is missing the correlation based on matching M_LABEL values – Andrew May 15 '18 at 08:36
This answer is might be correct, but if it is incorrect it shows missing information in the question. I see you assumed that the M_LABEL was a primary key. I took the question to mean that there is potentially a n:m relation between M_LABEL and M_DISPLAY. In that case the answer is wrong. – kutschkem May 15 '18 at 08:36
The suggested query may update less rows than the original. – Lukas Eder May 15 '18 at 08:49
Is it performing a different logic though? Both queries query the same table - TABLE_2. The subquery returns 1s for all rows where M_DISPLAY has a specified value. So the result of the update query will be updating TABLE_2 where M_DISPLAY has specified value, that's all. There are no two tables there. – Maksym May 15 '18 at 09:53
Yes, *very* different logic. Think that one through! – Lukas Eder May 15 '18 at 09:58

score 0 · Answer 3 · answered May 15 '18 at 11:04

For significantly faster execution:

Ensure that you have created an index on M_DISPLAY and M_LABEL columns that are in your WHERE clause.
Ensure that M_DISPLAY has a function based index. If it does not, then do not pass it to the TRIM function, because the function will prevent the database from using the index that you have created for the M_DISPLAY column. TRIM the data before storing in the table.

Thats it.

By the way, as has been mentioned, you shouldn't need 32k queries for meeting your objective. One will probably suffice. Look into query based update. As an example, see the accepted answer here: Oracle SQL: Update a table with data from another table

Update statement taking more than 24 hours for updating 32k rows

3 Answers3

Using FORALL

Concurrency

Bulk update

Indexing and data sanitisation