6

Problem

I'm trying to refactor a low-performing MERGE statement to an UPDATE statement in Oracle 12.1.0.2.0. The MERGE statement looks like this:

MERGE INTO t
USING (
  SELECT t.rowid rid, u.account_no_new
  FROM t, u, v
  WHERE t.account_no = u.account_no_old
  AND t.contract_id = v.contract_id
  AND v.tenant_id = u.tenant_id
) s
ON (t.rowid = s.rid)
WHEN MATCHED THEN UPDATE SET t.account_no = s.account_no_new

It is mostly low performing because there are two expensive accesses to the large (100M rows) table t

Schema

These are the simplified tables involved:

  • t The target table whose account_no column is being migrated.
  • u The migration instruction table containing a account_no_oldaccount_no_new mapping
  • v An auxiliary table modelling a to-one relationship between contract_id and tenant_id

The schema is:

CREATE TABLE v (
  contract_id NUMBER(18) NOT NULL PRIMARY KEY,
  tenant_id NUMBER(18) NOT NULL
);
CREATE TABLE t (
  t_id NUMBER(18) NOT NULL PRIMARY KEY,
  -- tenant_id column is missing here
  account_no NUMBER(18) NOT NULL,
  contract_id NUMBER(18) NOT NULL REFERENCES v
);
CREATE TABLE u (
  u_id NUMBER(18) NOT NULL PRIMARY KEY,
  tenant_id NUMBER(18) NOT NULL,
  account_no_old NUMBER(18) NOT NULL,
  account_no_new NUMBER(18) NOT NULL,

  UNIQUE (tenant_id, account_no_old)
);

I cannot modify the schema. I'm aware that adding t.tenant_id would solve the problem by preventing the JOIN to v

Alternative MERGE doesn't work:

ORA-38104: Columns referenced in the ON Clause cannot be updated

Note, the self join cannot be avoided, because this alternative, equivalent query leads to ORA-38104:

MERGE INTO t
USING (
  SELECT u.account_no_old, u.account_no_new, v.contract_id
  FROM u, v
  WHERE v.tenant_id = u.tenant_id
) s
ON (t.account_no = s.account_no_old AND t.contract_id = s.contract_id)
WHEN MATCHED THEN UPDATE SET t.account_no = s.account_no_new

UPDATE view doesn't work:

ORA-01779: cannot modify a column which maps to a non-key-preserved table

Intuitively, I would apply transitive closure here, which should guarantee that for each updated row in t, there can be only at most 1 row in u and in v. But apparently, Oracle doesn't recognise this, so the following UPDATE statement doesn't work:

UPDATE (
  SELECT t.account_no, u.account_no_new
  FROM t, u, v
  WHERE t.account_no = u.account_no_old
  AND t.contract_id = v.contract_id
  AND v.tenant_id = u.tenant_id
)
SET account_no = account_no_new

The above raises ORA-01779. Adding the undocumented hint /*+BYPASS_UJVC*/ does not seem to work anymore on 12c.

How to tell Oracle that the view is key preserving?

In my opinion, the view is still key preserving, i.e. for each row in t, there is exactly one row in v, and thus at most one row in u. The view should thus be updatable. Is there any way to rewrite this query to make Oracle trust my judgement?

Or is there any other syntax I'm overlooking that prevents the MERGE statement's double access to t?

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
  • Can you make `u.account_no_old` unique? A unique index would do it. (By the way, `bypass_ujvc` was removed in 11.2.) – William Robertson Aug 24 '18 at 11:09
  • @WilliamRobertson. No, account_no values are unique on a per-tenant basis, hence the join of `v`. – Lukas Eder Aug 24 '18 at 11:12
  • Well, that's why Oracle sees it as not key-preserved. The usual alternative is a PL/SQL cursor, either as a Cursor FOR loop or a bulk collect + `forall`. – William Robertson Aug 24 '18 at 11:15
  • @WilliamRobertson: There *is* a unique constraint on `(tenant_id, account_no_old)`, and I would love it to be applied transitively, given that I can formally prove it to be applicable here (I think). PL/SQL cursors would run *much* slower, I had tested those options too. The `MERGE` is quite faster. – Lukas Eder Aug 24 '18 at 11:31
  • You have probably both `tenant_id, account_no_old` and `tenant_id, contract_id` *unique*, but what is missing it the uniqueness of `account_no_old, contract_id`. – Marmite Bomber Aug 24 '18 at 11:39
  • @MarmiteBomber: The columns `account_no_old` and `contract_id` are not in the same table. Did you mean `(account_no, contract_id)`? But how would that change anything? Those columns are on `t`, which doesn't require any unique constraints as it is the table being updated. – Lukas Eder Aug 24 '18 at 11:48
  • I'm surprised that the bulk collect + forall approach is massively slower. I would expect it slightly slower than the equivalent SQL UPDATE or MERGE (if those worked, which they don't). Surely it's conceptually the same as your proposed update, it's just an annoyance to have to write it all out in procedural code to avoid the key preservation check. – William Robertson Aug 24 '18 at 13:54
  • @WilliamRobertson: Could've been a memory issue, given that all 100M+ rows were affected and loaded into PGA. Oracle EM was completely green (CPU), so that could've been a likely explanation. I didn't analyse this very thoroughly, though. – Lukas Eder Aug 24 '18 at 15:19
  • For signifcant volumes you would use a `limit` clause with your `bulk collect` and code it as a loop. – William Robertson Aug 24 '18 at 15:25
  • 1
    Sorry I misread the PK on `V`, which effectively means each `contract_id` has *exact one* `tenant_id`. So your join is - I'd call it - *count preserving*. But IMHO Oracle can preserve keys only if joining on FK to PK which can be transitive on more tables but is not fullfiled in your setup. – Marmite Bomber Aug 24 '18 at 17:25
  • @MarmiteBomber: I like the term *count preserving*. That's what this updatable view feature should be about, although it might be more difficult to formally prove... – Lukas Eder Aug 25 '18 at 09:38
  • I'm not sure that you need the subselect. Can't you do this with an update statement against table t? – Bobby Durrett Sep 28 '18 at 23:59
  • @BobbyDurret: Show me in an answer :) (and beware of the joins) – Lukas Eder Oct 01 '18 at 07:20
  • 1
    Added my answer. Not sure if I'm missing something. – Bobby Durrett Oct 01 '18 at 21:39

3 Answers3

1

You may define a temporary table containing the pre-joined data from U and V.

Back it with a unique index on contract_id, account_no_old (which should be unique).

Then you may use this temporary table in an updateable join view.

create table tmp as
  SELECT v.contract_id, u.account_no_old, u.account_no_new
  FROM u, v
  WHERE  v.tenant_id = u.tenant_id;

create unique index tmp_ux1 on tmp ( contract_id, account_no_old);


UPDATE (
  SELECT t.account_no, tmp.account_no_new
  FROM t, tmp
  WHERE t.account_no = tmp.account_no_old
  AND t.contract_id = tmp.contract_id
)
SET account_no = account_no_new
;
William Robertson
  • 15,273
  • 4
  • 38
  • 44
Marmite Bomber
  • 19,886
  • 4
  • 26
  • 53
  • Yes, that would work of course. I had tried doing in-memory collections with `BULK FETCH INTO`, which was slower than the `MERGE` solution. Haven't tested this approach... – Lukas Eder Aug 24 '18 at 11:34
  • @LukasEder give it a try;) Proper hash join possible with `parallel DML` enabled will do the best job. – Marmite Bomber Aug 24 '18 at 11:41
  • I'll stay clear of parallel DML for this case. I got burnt heavily by some arcane bug. See details here: https://twitter.com/AndrewSayer_/status/1017085117725016064 – Lukas Eder Aug 24 '18 at 11:44
1

Is there any way to rewrite this query to make Oracle trust my judgement?

I've managed to "convince" Oracle to do MERGE by introducing helper column in target:

MERGE INTO (SELECT (SELECT t.account_no FROM dual) AS account_no_temp,
                    t.account_no, t.contract_id 
            FROM t) t
USING (
  SELECT u.account_no_old, u.account_no_new, v.contract_id
  FROM u, v
  WHERE v.tenant_id = u.tenant_id
) s
ON (t.account_no_temp = s.account_no_old AND t.contract_id = s.contract_id)
WHEN MATCHED THEN UPDATE SET t.account_no = s.account_no_new;

db<>fiddle demo


EDIT

A variation of idea above - subquery moved directly to ON part:

MERGE INTO (SELECT t.account_no, t.contract_id FROM t) t
USING (
      SELECT u.account_no_old, u.account_no_new, v.contract_id
      FROM u, v
      WHERE v.tenant_id = u.tenant_id
    ) s
ON ((SELECT t.account_no FROM dual) = s.account_no_old
     AND t.contract_id = s.contract_id)
WHEN MATCHED THEN UPDATE SET t.account_no = s.account_no_new;

db<>fiddle demo2

Related article: Columns referenced in the ON Clause cannot be updated

EDIT 2:

MERGE INTO (SELECT t.account_no, t.contract_id FROM t) t
USING (SELECT u.account_no_old, u.account_no_new, v.contract_id
       FROM u, v
       WHERE v.tenant_id = u.tenant_id) s
ON((t.account_no,t.contract_id,'x')=((s.account_no_old,s.contract_id,'x')) OR 1=2) 
WHEN MATCHED THEN UPDATE SET t.account_no = s.account_no_new;

db<>fiddle demo3

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
  • 1
    Hah, that's hilarious! :-) Databases trying to be smart about preventing some SQL usage... Go figure. – Lukas Eder Oct 01 '18 at 07:03
  • Thanks for linking to my blog post :) Notice that I think both your workarounds will prevent index access on the `t.account_no` column. It might be better to apply one of the other workarounds as mentioned in the linked article. Specifically the one where a row value expression is used `ON ((t.account_no, 'dummy') = ((s.account_no_old, 'dummy')) AND t.contract_id = s.contract_id)` – Lukas Eder Jan 03 '19 at 08:45
  • @LukasEder It looks like the approach with `dummy` is not working for this case. **[demo](https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=e697daa971ad5e4b5753c71d47e50605)** – Lukasz Szozda Jan 03 '19 at 15:30
  • Argh! It works if you replace `AND` by `OR`, though (discounting the significant semantic change, of course) o_O. The `NVL()` solution still seems to work: https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=30daa542fc92340b776c51ad2e94d994 – Lukas Eder Jan 04 '19 at 10:40
  • 1
    @LukasEder heh, it is even more ridiculous :) **[demo](https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=c3ee060a69449ff4acb28805ed68ae8a)** `ON ((t.account_no,t.contract_id, 'dummy') = ((s.account_no_old,s.contract_id, 'dummy')) OR 1=2)` – Lukasz Szozda Jan 04 '19 at 15:25
  • 1
    This is really wacko! – Lukas Eder Jan 07 '19 at 08:55
1

Trying to do this with a simpler update. Still requires a subselect.

update t
set t.account_no = (SELECT u.account_no_new
  FROM u, v
  WHERE t.account_no = u.account_no_old
  AND t.contract_id = v.contract_id
  AND v.tenant_id = u.tenant_id);

Bobby

Bobby Durrett
  • 1,223
  • 12
  • 19
  • Thanks a lot. It does work in my reduced example. I remember it didn't work in the real world example - but I can't remember why anymore. – Lukas Eder Oct 02 '18 at 07:42
  • I wasn't sure if there was something the original statement did that this update didn't cover. But I made a simple example and it worked in that case. – Bobby Durrett Oct 02 '18 at 20:41