Improve MERGE performance when using big tables

Question

Context

We have a model in which each element has an element kind and from 0 to N features. Each feature belongs to only one element and has a feature name.

This is modeled as the following tables:

ELEMENTS
  elem_id int not null -- PK
  elem_elki_id int not null -- FK to ELEMENT_KINDS
  -- more columns with elements data

ELEMENT_KINDS
  elki_id int not null -- PK
  -- more columns with elements kinds data

FEATURES
  feat_id int not null -- PK
  feat_elem_id int not null -- FK to ELEMENTS  
  feat_fena_id int not null -- FK to FEATURE_NAMES
  -- more columns with features data

FEATURE_NAMES
  fena_id int not null -- PK
  -- more columns with feature_names data

Requirement

There is a new requirement of replacing the feature names table with a feature kinds table. There is one (and only one) feature kind for each (element kind, feature name) pair.

The changes in the models were adding a new column and creating a new table:

ALTER TABLE features ADD feat_feki_id int null;

CREATE TABLE FEATURE_KINDS
(
  feki_id int not null, -- PK
  feki_elki_id int not null, -- FK to ELEMENT_KINDS
  feki_fena_id int null, -- FK* to FEATURE_NAMES
  -- more columns with feature kinds data
)

*feki_fena_id is actually a temp colum showing which feature name was used to create each feature kind. After populating feat_feki_id, feki_fena_id should be discarded along with feat_fena_id and the feature names table.

Problem

After successfully populating the features kinds table we are trying to populate the feat_feki_id column using the following query:

MERGE INTO features F
USING
(
    SELECT *
    FROM elements
    INNER JOIN feature_kinds
    ON elem_elki_id = feki_elki_id
) EFK
ON
(
    F.feat_elem_id = EFK.elem_id AND
    F.feat_fena_id = EFK.feki_fena_id
)
WHEN MATCHED THEN
UPDATE SET F.feat_feki_id = EFK.feki_id;

This works in small case scenarios with test data, but in production we have ~20 million elements and ~2000 feature_kinds and it takes about an hour before throwing an ORA-30036: unable to extend segment by 8 in undo tablespace 'UNDOTBS1' error.

Question

Is there any way I could improve the performance of the MERGE so that it works? (Maybe I'm lacking some indexes?)

Is there another alternative to fill up the feat_feki_id column? (We already have tried UPDATE instead of MERGE with similar results)

So make segment bigger, or break-up by some field that has a set of range values — OldProgrammer, Dec 27 '16 at 20:33
Could your elaborate more in the breaking up part, please? (Making the segment bigger doesn't seem like an option... it is still taking more than one hour before that problem emerges) — Sam, Dec 27 '16 at 20:34

score 2 · Accepted Answer · answered Dec 27 '16 at 21:17

2

It's not clear whether there is something wrong going on or whether your undo segments are just too small. Can you do the following statement without getting an ORA-30036?

UPDATE features f SET f.feat_feki_id = 12345;

If that doesn't work, you just need to increase the size of your undo segment. Kludges are available to do the update in chunks, but you really shouldn't have to do that.

Assuming it's NOT a simple UNDO size issue, one thing you might do is make sure that your MERGE (or UPDATE) is updating rows in the order they appear in your table. Otherwise, you could be revisiting the same blocks over and over, really hurting performance and increasing UNDO usage. I encountered this in a similar operation I had to do a few years ago and I was shocked when I finally figured it out.

To avoid the problem I had, you would want something like this:

MERGE INTO features F
USING
(
    SELECT f.feat_id, fk.feki_id
    FROM features f
    INNER JOIN elements e ON e.elem_id = f.feat_elem_id
    INNER JOIN feature_kinds fk ON fk.feki_elki_id = e.elem_elki_id and fk.feki_fena_id = f.feat_fena_id
    -- Order by the ROWID of the table you are updating to ensure you are not revisiting the same block over and over
    ORDER BY f.rowid
) EFK
ON
(
    F.feat_id = efk.feat_id )
)
WHEN MATCHED THEN
UPDATE SET F.feat_feki_id = EFK.feki_id;

I may have gotten your data model wrong, but the key point is to include the FEATURES table in the MERGE query and ORDER BY features.rowid to ensure that the updates happen in row order.

answered Dec 27 '16 at 21:17

Matthew McPeak

17,705
2
27
59

Thanks, I'll test this when possible. Even if I have an UNDO tablespace issue, the query taking more than an hour seemed like something was wrong. For the moment, my worry is that the `features` table is almost as big as the `elements` table so that double join you suggest might result on a huge table. Do you think that would be a problem? – Sam Dec 27 '16 at 21:58
Yes, it could be a problem. Joining two 20M row tables (plus a third, smaller table) and sorting the results is taxing. Use parallel query, if you can, to help. The query plan should involve a lot of full scans and hash joins, so you should be able to query `V$SESSION_LONGOPS` as it runs to see how the query is progressing. – Matthew McPeak Dec 27 '16 at 22:06
1

This solution worked, but the `ORDER BY` caused a massive use of the `TEMP` tablespace. In my case, around 50 GB. In order for it to work, I had to create an addtional file for that tablespace, just as this answer shows: http://stackoverflow.com/a/25350967 – Sam Jan 23 '17 at 17:24
1

Thanks for the feedback! I wonder if there is a way to write the query in such a way that it would naturally be ordered by `features.rowid` and, if so, whether it would be safe to rely on that behavior. I'm thinking maybe a hint to cause `FEATURES` to be the so-called "probe table" in the hash join instead of the "build table". It may be worth further investigation if you need to get it even faster and/or get rid of the 50GB temp usage. – Matthew McPeak Jan 23 '17 at 17:57
As for performance, the whole query ran in less than an hour and it is not supposed to be ran so often, so your current solution will do the trick. I'll consider your other suggestion regarding the hints for avoiding the order by. – Sam Jan 23 '17 at 18:36

Improve MERGE performance when using big tables

Context

Requirement

Problem

Question

1 Answers1