Avoid redundant updates

Question

I'm working in an IO bound system (and this isn't going to change). So I'm rewriting some sql to only update when it needs to and it's going really well. I'm seeing about a 70% increase in performance. The only problem is the sql is more bloated which isn't the end of the world, just more code to maintain.

So my question is.. Is there an easier way to get Oracle to only update when it needs to compare to adding a where clause:

update table_name
   set field_one = 'one'
 where field_one != 'one';

Note: The real code is much more complex so adding a 'where' like this can sometimes double the length of the query.

Using 11g

Can't think of an easier way. Of course, the example you posted doesn't handle `NULL` values correctly. If `field_one` can be `NULL`, your `WHERE` clause would exclude it. — Justin Cave, Jul 28 '14 at 20:12
that's a good point. fortunately for me this database doesn't have any fields that can be null (because of the application that uses it) but that's something i need to keep in mind in the future. — Bill, Jul 28 '14 at 20:34
Not sure if you're IO at field or row level... If row, Each row of each table has a unique (ORA_ROWSCN) embedded. You could check this at a row level to find if an update is needed for the row, but not at a column level... additionally ora_ROWSCN is an extremely precise timestamp... as seen if you `SELECT SCN_TO_TIMESTAMP(ORA_ROWSCN) from tableName` — xQbert, Jul 28 '14 at 20:34
See http://stackoverflow.com/questions/2446764/oracle-update-statement-with-inner-join for examples on how to avoid mentioning the `'one'` expression twice in updates/merges. — halfbit, Aug 08 '14 at 21:00

score 6 · Accepted Answer · answered Aug 08 '14 at 21:09

6

Given the nature of how SQL works, this is exactly what you need to do. If you tell it:

update table_name    
   set field_one = 'one';

that means something entirely different in SQL than

update table_name
   set field_one = 'one'
 where field_one != 'one';

The database can only process what you told it to process, In the first case, because there is no where clause, you have told it to process all the records.

In the second case you have put a filter on it to process only some specific records.

It is up to the code writer not the database to determine the content of the query. If you didn't want every record updated, you should not have told it to do so. The database is quite literal about the commands you give it. Yes the second set of queries are longer becasue they are being more specific. They havea differnt meaning than the orginal queries. That is all to the good as it is far faster to update the ten records you are interested in than all 1,000,000 records in the table.

You need to get over the idea that longer is somehow a bad thing in database queries. Often it is a good thing as you are being more correct in what you are asking for. Your orginal queries were simply incorrect. And now you have pay the price to fix what was a systemically bad practice.

answered Aug 08 '14 at 21:09

HLGEM

94,695
15
113
186

I'd argue that SQL should be *less* literal than other languages. The optimizer already transforms declarative queries significantly - joining tables and running predicates in whatever order it wants. It seems like it might as well have the ability to detect when a value is changed to itself and not actually write it to disk. – Jon Heller Aug 14 '14 at 17:13
@jonearles, you can argue it any way you want, databases don't work that way. Changing the order of joins, etc. does not change the results set, what you want does. The programmer of the database software cannot deduce your intent, perhaps you had a reason for updating the records (such as triggering a process in a trigger.) even when the data didn't change. The internals of a database are quite complex, they don't need to be making assuptions about what you meant as well. – HLGEM Aug 14 '14 at 17:50
Yeah, you're right. There's always the potential for some process depending on one of the many side-effects of a change. – Jon Heller Aug 14 '14 at 19:15
I'm sorry for my ignorance... which are you saying is correct, with or without the where clause? – Bill Aug 18 '14 at 14:29
@Bill With the where clause. If you do not have the where clause, then you will update every record in the table. An update should almost never be written without a where clause. – HLGEM Aug 18 '14 at 14:47

score 4 · Answer 2 · answered Aug 07 '14 at 15:21

4

I guess there isn't an easier way....

answered Aug 07 '14 at 15:21

Bill

1,237
4
21
44

2

I'm pretty sure you are correct. If nobody can prove you wrong to collect the bounty than you can be pretty confident in your answer. – Jon Heller Aug 07 '14 at 18:02

dani herrera · Answer 3 · 2014-08-15T13:29:46.973

Create a view over the table and write your custom instead of trigger

--DDL create your sample table:
create table table_name (
   field_one varchar2(100)
  );

--DDL create the view
create view view_name as 
select * from table_name;    

--DDL creating instead trigger
CREATE OR REPLACE TRIGGER trigger_name
   INSTEAD OF UPDATE ON view_name
   FOR EACH ROW
   BEGIN
     UPDATE table_name
     SET field_one = :NEW.field_one
     where :OLD.field_one != :NEW.field_one
     ;
   END trigger_name;

--DML testing:
update view_name set field_one = 'one'

EDITED, test results

I have tested my approach in a 200K rows scenario with 1/20 factor for updatable rows. Here results:

Script time updating table directly: 1.559,05 secs
Script time updating through trigger: 1.101,14 secs

Steps to repreduce test:

Creating table, view and trigger:

create table table_name (
   myPK int primary key,
   field_1 varchar2(100),
   field_2 varchar2(100),
   field_3 varchar2(4000)
  );

create view view_name as 
select * from table_name;

CREATE OR REPLACE TRIGGER trigger_name
   INSTEAD OF UPDATE ON view_name
   FOR EACH ROW
   BEGIN
     UPDATE table_name
     SET 
        field_1 = :NEW.field_1,
        field_2 = :NEW.field_2
     where 
        myPK = :OLD.myPK
        AND not ( :OLD.field_1 = :NEW.field_1 and
                  :OLD.field_2 = :NEW.field_2 )
     ;
   END trigger_name;

Inserting dummy data with 1/20 factor for updatable rows:

DECLARE
   x NUMBER := 300000;
BEGIN
   FOR i IN 1..x LOOP
      IF MOD(i,20) = 0 THEN     
         INSERT INTO table_name VALUES (i, 'rare', 'hello', 
                                           dbms_random.string('A', 2000));
      ELSE
         INSERT INTO table_name VALUES (i, 'ordinary', 'bye', 
                                           dbms_random.string('A', 2000) );
      END IF;
   END LOOP;
   COMMIT;
END;

Script for testing performace:

declare
    l_start number;
    l_end number;
    l_diff number;    
    rows2update int;

begin
   rows2update := 100000;
   l_start := dbms_utility.get_time ;
   affectedRows := 0;
   FOR i IN 1..rows2update LOOP

      rows2update := rows2update - 1;
      update view_name    --<---- replace by table_name to test without trigger
         set field_1 = 'ordinary' 
       where myPK = round( dbms_random.value(1,300000) )    ; 
      commit;

   end loop;

   l_end := dbms_utility.get_time ;
   dbms_output.put_line('l_start ='||l_start);
   dbms_output.put_line('l_end ='||l_end);
   l_diff := (l_end-l_start)/100;
   dbms_output.put_line('Elapsed Time: '|| l_diff ||' secs');

end;
/

Disclaimer: This is a simple test made in a virtual environment only as a first approach test. I'm sure than results can change significantly just changing field lenght or other parameters.

This seems like the best idea so far. But could this approach be made generic so that it covers all column combinations? And how would it perform? — Jon Heller, Aug 08 '14 at 22:29
Where clause may looks like ’where not (:old.x = :new.x and :old.y = :nes.y and ... )' I have not test this approach, performance depends on the updates granularity (number of updates rows at a time) You can test it and come back, we are talking about 20 ddl lines. — dani herrera, Aug 09 '14 at 07:58
Won't this make the system more I/O bound? You're turning a bulk update into a row-by-row update, which will be considerably slower. — Ben, Aug 09 '14 at 10:11
Even if 99% of the updates are one row at a time, all it takes is one need to manually run an update of all records for one client urgently, to bring the entire system down. — HLGEM, Aug 14 '14 at 17:57
Hi @ben, I have made my own tests. Please, take a look. Best regards. — dani herrera, Aug 15 '14 at 13:30

score 1 · Answer 4 · answered Aug 07 '14 at 22:31

1

Try to use merge statements. it may reduce the running time of the update queries.

the above query can be rewritten like this,

MERGE INTO table_name
USING ( SELECT ROWID from table_name Where field_one != 'one') data_table
ON ( table_name.ROWID = data_table.ROWID)
WHEN MATCHED THEN
UPDATE SET table_name.field_one = 'one';

answered Aug 07 '14 at 22:31

anandh04

86
4

1

This doesn't seem any easier, the predicates are still listed twice. – Jon Heller Aug 08 '14 at 16:17

score 1 · Answer 5 · answered Aug 08 '14 at 20:47

1

Did I understand correctly that the problem here is repeating the value 'one' in the query?

If yes, then you can use the following:

update (select field_one, 'one' new_field_one from table_name)
   set field_one = new_field_one
 where field_one != new_field_one;

answered Aug 08 '14 at 20:47

Kombajn zbożowy

8,755
3
28
60

Another good idea. But the limitations on DML Table Expressions may make this unusable for most real-world statements. – Jon Heller Aug 08 '14 at 22:33
Hmm, I tried several simple updates and everything was fine, even if 'one' is selected from a correlated subquery. [Here's](http://sqlfiddle.com/#!4/6971e/1) the fiddle to test. Yes, the [docs](http://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_8004.htm) say that subquery in select list renders the view non-updatable. But any real example of when the update is impossible? – Kombajn zbożowy Aug 10 '14 at 20:45
Adding joins to the UPDATE will cause "non-key preserved" errors, unless a lot of specific conditions are met. But this is still a good solution, even if it works only part of the time. – Jon Heller Aug 14 '14 at 16:58

score 1 · Answer 6 · answered Aug 13 '14 at 11:51

1

If your table has a column that in the end should consist of only one value - why do you need the column at all?

If for some reason you need the column, you can drop the column and then recreate it with the default value you want.

answered Aug 13 '14 at 11:51

Thomas Köhne

141
6

This is just a simple example. I don't have anything this simple in rl. :) – Bill Aug 13 '14 at 13:27

kinjelom · Answer 7 · 2014-08-08T15:42:00.267

0

There is a simple explanation for this behavior:

any update, even with the same value, must fire triggers (both external and internal)

so, without this "idle" updates your system may stop working in the way that was designed...

Further information: http://orainternals.wordpress.com/2010/11/04/does-an-update-statement-modify-the-row-if-the-update-modifies-the-column-to-same-value/

PS. The solution "noneditioning view + INSTEAD OF trigger" solves the access problem, but an INSTEAD OF trigger is always a row-level trigger, so this approach can ruin performance.

edited Aug 08 '14 at 15:42

answered Aug 08 '14 at 08:50

kinjelom

6,105
3
35
61

thanks for the article! it was very enlightening but I missed the 'your system may stop working' part. Can you explain more? I didn't pick up on anything negative about not doing the update. The 'table block' is updated and that wouldn't happen when using a 'where' but I don't know why that's a bad thing. thanks again! – Bill Aug 08 '14 at 12:28
If there are triggers that should be raised even on idle update... you know, for example trigger that insert user access log... – kinjelom Aug 08 '14 at 14:46

JCalcines · Answer 8 · 2014-08-08T13:36:57.980

I asume that you are looking for an idea instead of code, so this is my mine.

You could use Dynamic SQL Statements to compose your update queries. Let's take the next query :

UPDATE table_name SET field_one = 'one', set field_two = 'where,=' WHERE id = 1

Instead of call to update statement you should call to a procedure which manage the query. This procedure should split the SET clause, looking for the first SET until WHEREor nothing if there isn't a where in your statement. You should be carefull with what is in your assignments since It could contain a WHERE

field_one = 'one', set field_two = 'where,'

Then replace every , with AND and every = with !=. Take care of the assignments.

field_one <> 'one' AND set field_two <> 'where,'

Reattach it to the query on the WHERE clause with an AND. Maybe there isn't a WHERE clause so, you have to add it.

UPDATE table_name 
SET field_one = 'one', set field_two = 'where,=' 
WHERE id = 1 AND field_one != 'one' AND set field_two != 'where,'

With this way you can you your current queries with no modification, just a litte wrap call update_wraper ('Your query');.

PD: I'm assuming by your commentary that you don't work with NULL values.

score 0 · Answer 9 · answered Aug 08 '14 at 15:04

Looking at your statement:

update table_name
   set field_one = 'one'
 where field_one != 'one';

The problem is the "field_one != 'one'" predicate. Using any conventional indexing scheme, this type of not equal predicate is going to cause a FULL TABLE SCAN, which is (likely) going to cause lots of I/O, which you're trying to avoid, particularly if the table is very large. So, what to do?

If the table is large, and the number of rows that satisfy the predicate is (relatively) small, we may be able to do something with function based indexes, and (since you're on 11g) cleverly hide the FBI behind a virtual column.

I'm thinking about something like this:

create table table_name(field_one varchar2(10),
                        field_one_not_one generated always as (case when field_one = 'one' then null else field_one end));
create index field_one_not_one_indx on table_name(field_one_not_one);

Now, just do:

update table_name
   set field_one = 'one'
 where field_one_not_one is not null;

Now, the update should do a FULL INDEX SCAN, but that's going to be much less I/O than an FTS, particularly if the number of rows that need to be updated is relatively small. The idea is that the FBI will only have the rows that need to be updated. As long as that number of rows is significantly less than the total rows in the table, this should be an effective indexing strategy.

I think the problem here is time spent writing unnecessary changes, not the time to retrieve the data. — Jon Heller, Aug 08 '14 at 16:20
Is it? That wasnot clear to me. However, if the problem is the time to do the actual update, that's going to be (necessarily) proportional to the number of rows that need to be updated. My attempt above, was to more efficiently identify the rows that need to be updated. (At least there was some room to work with there.) Once you know which rows need to be updated, I'm not sure how you can improve on the actual update. — Mark J. Bobak, Aug 08 '14 at 17:23

Alessandro Rossi · Answer 10 · 2014-08-09T11:51:57.940

There shouldn't be an easy way to achieve your objective. But if you have the Enterprise Edition you would have one possibility to do this ...it very difficult anyway: implement Virtual Private Database (VPD) feature through DBMS_RLS on the tables with pour performance updates using a policy that calls a function that works in this way:

Detects the SQL that is actually executing the update through joining some dynamic performance views (V$VPD_POLICY, V$SQLAREA and if they won't be enough also V$OPENCURSOR or V$SQL_MONITOR) for the current USER or current SESSION_ID.
Parses the UPDATE statement to detect the columns used in the SET clause and their assigned values. You could use a parser generator like antlr or a similar tool to build a java parser, once your tests will work fine you can load the JAR on the database to be wrapped with PL/SQL procedure
Returns the additional where condition with the concatenation of old_column_value != new_column_value just detected by the parsing step.

An example flow:

Your application executes an UPDATE:

UPDATE <table> t
    SET t.<col1> = 5,
        <col2> = :named_bind,
        <col3> = (SELECT o.<col5> FROM <oth_table> o WHERE t.<col4> = o.<col6>)

The system activates the VPD policy and invokes the just described function before executing the update
This function after executing the query on the dynamic performance views fetches the current executing statement
Then parses the statement witch will return the 3 pairs (column_name, column_value):

aa

("t.<col1>", "5"),
("<col2>", ":named_bind"),
("<col3>", "(SELECT o.<col5> FROM <oth_table> o WHERE t.<col4> = o.<col6>)")

(5)The function returns the additional where clause:

aa

t.<col1> != 5 
AND <col2> != :named_bind
AND <col3> != (SELECT o.<col5> FROM <oth_table> o WHERE t.<col4> = o.<col6>)

(6)The system will convert the statement into:

aa

UPDATE <table> t
    SET t.<col1> = 5,
        <col2> = :named_bind,
        <col3> = (SELECT o.<col5> FROM <oth_table> o WHERE t.<col4> = o.<col6>)
WHERE t.<col1> != 5 
    AND <col2> != :named_bind
    AND <col3> != (SELECT o.<col5> FROM <oth_table> o WHERE t.<col4> = o.<col6>)

I haven't tested any part of this, the job to do is very long, but it should work fine once it's done in my opinion. One last thing: this method wouldn't work with statements using positional binding variables, with named ones it could be fine but I'm not definitely sure about it. This may also be a very intense job and for processing frequent and (already)fast UPDATES it should be avoided so you'll also need to find a way to exclude those fast statements to be processed.

score 0 · Answer 11 · edited Aug 14 '14 at 15:43

This question reminded me about an old AskTom conversation How to Update millions or records in a table

In a nutshell, it suggest the following solution to improve the performance of large number of updates on a table:

CREATE TABLE new_table AS SELECT (do the update "here") FROM old_table

INDEX new_table grant on new table

ADD CONSTRAINTS ON new_table etc

DROP TABLE old_table

RENAME new_table TO old_table;

You can do that using parallel query, with "nologging" on most operations generating very little redo and no undo at all -- in a fraction of the time it would take to update the data.

However, I will keep it simple:

Create a temporary table containing the keys and values of the rows you intend to update; filter rows which you can tell no update is actually required because the new value is the same as old value

CREATE GLOBAL TEMPORARY TABLE temp_table_name ON COMMIT DELETE ROWS
AS
SELECT id, 'one' as field_one FROM table_name WHERE field_one != 'one'

Update main table from temporary table

MERGE INTO table_name b
USING 
(
  SELECT id,
         field_one
  FROM temp_table_name
) a ON (a.id = b.id)
WHEN MATCHED THEN UPDATE 
    SET b.field_one = a.field_one

Or

UPDATE a
 SET a.field_one = b.field_one
FROM temp_table_name a
INNER JOIN table_name b
 ON a.ID = b.ID

Avoid redundant updates

12 Answers12