Why optimizer plan doesn't correlate with experimental query runs?

Question

Suppose we have the following problem:

Given a table with one column 'X', containing some rows with random integers from 1 to 100:

CREATE TABLE xtable(x) AS 
   SELECT ceil(dbms_random.value * 100) 
     FROM dual
   CONNECT BY level <= 1000000;

We must delete duplicate rows so all the distinct integers remain in the table.

Let's consider the three solutions (with average execution times and optimizer plans) below.

I must add that experiments show:

Solutions 1 and 2 are scalable and have a linear time growth with each row amount step (tested with tables up to 10 million rows)
Solution 3 has exponential time growth approximately like 3 * exp(0.6 * N)

We see that for the solution 2 optimizer plan give expectations unrelated to experimental results, and even opposite to them:

cost and other values are almost the same in plans 2 and 3
execution times are practically the same for solutions 1 and 2

And in this experiments the presence or absence of gathered statistics for the table doesn't affect optimizer plans and execution times.

Please, explain why I can't trust the optimizer plan in case 2.

What causes the optimizer to ignore the obvious difference between linear and exponential complexity?

Solutions:
1.

DELETE xtable WHERE rowid IN (
      SELECT ri from (
         SELECT rowid                                             AS ri,
                row_number() OVER(PARTITION BY x ORDER BY null) AS rn
           FROM xtable
      )
      WHERE rn > 1
)


Exe time: 14 - 16 secs

Plan:
------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows    | Bytes    | Cost | Time     |
------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT         |          | 1000000 | 15000000 | 5119 | 00:00:01 |
|   1 |   DELETE                 | XTABLE   |         |          |      |          |
| * 2 |    HASH JOIN SEMI        |          | 1000000 | 15000000 | 5119 | 00:00:01 |
|   3 |     TABLE ACCESS FULL    | XTABLE   | 1000000 |  3000000 |  280 | 00:00:01 |
|   4 |     VIEW                 | VW_NSO_1 | 1000000 | 12000000 | 2976 | 00:00:01 |
| * 5 |      VIEW                |          | 1000000 | 25000000 | 2976 | 00:00:01 |
|   6 |       WINDOW SORT        |          | 1000000 |  3000000 | 2976 | 00:00:01 |
|   7 |        TABLE ACCESS FULL | XTABLE   | 1000000 |  3000000 |  280 | 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 2 - access(ROWID="RI")
* 5 - filter("RN">1)

2.

DELETE xtable WHERE (x, rowid) NOT IN (SELECT x, min(rowid) FROM xtable GROUP BY x)

Exe time: 15 - 17 secs

Plan:
--------------------------------------------------------------------------------------
| Id | Operation                 | Name   | Rows    | Bytes   | Cost      | Time     |
--------------------------------------------------------------------------------------
|  0 | DELETE STATEMENT          |        |   50000 |  150000 | 278162850 | 03:01:06 |
|  1 |   DELETE                  | XTABLE |         |         |           |          |
|  2 |    FILTER                 |        |         |         |           |          |
|  3 |     TABLE ACCESS FULL     | XTABLE | 1000000 | 3000000 |       281 | 00:00:01 |
|  4 |     FILTER                |        |         |         |           |          |
|  5 |      SORT GROUP BY NOSORT |        | 1000000 | 3000000 |       280 | 00:00:01 |
|  6 |       TABLE ACCESS FULL   | XTABLE | 1000000 | 3000000 |       280 | 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 5 - access(INTERNAL_FUNCTION("X")=INTERNAL_FUNCTION("X") AND INTERNAL_FUNCTION(ROWID)=INTERNAL_FUNCTION("MIN(ROWID)"))
* 5 - filter(INTERNAL_FUNCTION(ROWID)=INTERNAL_FUNCTION("MIN(ROWID)") AND INTERNAL_FUNCTION("X")=INTERNAL_FUNCTION("X"))

3.

DELETE xtable a WHERE EXISTS(select 1 FROM xtable b WHERE a.x = b.x AND a.rowid < b.rowid)

Exe time: 970 - 990 sec

Plan:
----------------------------------------------------------------------------------------------
| Id  | Operation                        | Name   | Rows    | Bytes   | Cost      | Time     |
----------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT                 |        |   50000 |  300000 | 278208956 | 03:01:08 |
|   1 |   DELETE                         | XTABLE |         |         |           |          |
| * 2 |    FILTER                        |        |         |         |           |          |
|   3 |     NESTED LOOPS SEMI            |        |   50000 |  300000 | 278208956 | 03:01:08 |
|   4 |      TABLE ACCESS FULL           | XTABLE | 1000000 | 3000000 |       280 | 00:00:01 |
| * 5 |      TABLE ACCESS BY ROWID RANGE | XTABLE |   50000 |  150000 |       278 | 00:00:01 |
----------------------------------------------------------------------------------------------    
Predicate Information (identified by operation id):
------------------------------------------
* 2 - filter(:VAR2=:VAR1)
* 5 - access("B".ROWID>"A".ROWID)

Plans were obtained on Oracle 12.1.0.2.0

What do you mean when you say "can't trust the optimizer plan in case 2"? And what makes you think the execution plans for 2 and 3 are similar? Execution plan 3 has two full table scans joined by nested loop semi, and then filtered. Execution plan 2 has a sort followed by a filter and then uses that to filter the results of a full table scan. This is much more similar to the first execution plan, IMHO. — Boneist, Jul 05 '16 at 08:31
@Boneist, I consider the similarity of Total values in columns Rows,Bytes,Cost,Time. It's surprizing that they are almost the same, and changing synchronously when we fill the table with different amounts of rows: 1000, 10 000, ..., 10 000 000 — diziaq, Jul 05 '16 at 08:57
Plus one for nice prepared Q. Pls provide Oracle version and add *Predicate Information* to the explain plans. See [here](http://stackoverflow.com/questions/34975406/how-to-describe-performance-issue-in-relational-database?answertab=active#tab-top) how to get the information. — Marmite Bomber, Jul 05 '16 at 09:20

Mottor · Answer 1 · 2016-07-05T09:01:05.577

Cannot reproduce the second plan. Here comes:

-------------------------------------------------------------------------------------------                                       
| Id  | Operation              | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |                                       
-------------------------------------------------------------------------------------------                                       
|   0 | DELETE STATEMENT       |          |       |       |       |  3648 (100)|          |                                       
|   1 |  DELETE                | XTABLE   |       |       |       |            |          |                                       
|   2 |   MERGE JOIN ANTI NA   |          |   999K|    26M|       |  3648   (5)| 00:00:01 |                                       
|   3 |    SORT JOIN           |          |  1000K|  2929K|    22M|  3147   (3)| 00:00:01 |                                       
|   4 |     TABLE ACCESS FULL  | XTABLE   |  1000K|  2929K|       |   434   (3)| 00:00:01 |                                       
|*  5 |    SORT UNIQUE         |          |   100 |  2500 |       |   500  (16)| 00:00:01 |                                       
|   6 |     VIEW               | VW_NSO_1 |   100 |  2500 |       |   499  (16)| 00:00:01 |                                       
|   7 |      SORT GROUP BY     |          |   100 |   300 |       |   499  (16)| 00:00:01 |                                       
|   8 |       TABLE ACCESS FULL| XTABLE   |  1000K|  2929K|       |   434   (3)| 00:00:01 |                                       
-------------------------------------------------------------------------------------------

It looks like what I expected to have. Seems to be my DB has an anomaly or tricky settings or something else. It doesn't really matter now. The thing bothering me is how to know for sure the way DB will execute a query. I'm confused because have never met such issue before and thought that EXPLAIN PLAN at least approximately shows what to expect when query runs. — diziaq, Jul 05 '16 at 09:08
@diziaq What Oracle version do you have? I have tested it in 11 and 12 and execution plan is the same (when we do not count the costs). You will be never sure. You can use SQL Plan Baselines — Mottor, Jul 05 '16 at 09:17
I got the plans on Oracle 12.1 many times yesterday and today using different instances. After asking the question I tried the same queries on Oracle 11.2 and, as you do, I can't reproduce the plan I asked about. So it looks like an accident, but I'm wondering what causes it. — diziaq, Jul 05 '16 at 10:38

Marmite Bomber · Accepted Answer · 2016-07-05T11:16:52.507

Please, explain why I can't trust the optimizer plan in case 2.

You should never trust the optimizer. CBO is 95% rigth, but you do not know which 5% are wrong.

Typical problem is that the execution plan shown using EXPLAIN PLAN doesn't equal to the plan used by execution. (You do not say how you obtain the plan).

In doubt use DBMS_SQLTUNE.REPORT_SQL_MONITOR for long running queires to see the actual plan and the problematic parts.

What causes the optimizer to ignore the obvious difference between linear and exponential complexity?

See above and forget the cost compare of the plans. What you want to avoid while processing a whole table is the NESTED LOOP processing. This is exactly what happens in case 3.

 |  3 |     NESTED LOOPS SEMI            |       |   50000|  300000 | 278208956 | 03:01:08|
 |  4 |      TABLE ACCESS FULL           |XTABLE | 1000000| 3000000 |       280 | 00:00:01|
 |  5 |      TABLE ACCESS BY ROWID RANGE |XTABLE |   50000|  150000 |       278 | 00:00:01|

You want to see SORT and HASH JOIN this is waht plan 1 shows.

In my opinion the plan 2 will not scale with the number of duplicated records (simple try it with a table having each row twice and see if you get the same elapsed time as in case 3). The optimizer can not estimate the number of duplicated records, so defensively estimates high number and therefore high cost.

Last but one remark - the theory says you should not observe linear behaviour but at best O(n * log(n)).

Last remark - your test data are not realistic for a dups removal. Typical you have a large table with a small number of of dups. In your setup all records except for 100 are dups.

The cost of delete dominates the cost of finding dups so you observe linear behaviour.

Try with

CREATE TABLE xtable(x) AS 
   SELECT ceil(dbms_random.value * 100000000) 
     FROM dual
   CONNECT BY level <= 1000000;

select count(*) total, count(*)- count(distinct x) to_be_deleted from xtable;
     TOTAL TO_BE_DELETED
---------- -------------
   1000000          5083

So you will remove 0.5% of the records. Now scale and you will observe completely other pattern.

Thank you for useful info. And I agree about O(n * log(n)), but surprisingly results are 100,000 rows : 1.65 sec; 1,000,000 rows - 15.8; 10,000,000 - 138.6 sec; even little lower than it might be expected. I suppose that low quality of test data neutralizes common theoretical effects. — diziaq, Jul 05 '16 at 10:21

Why optimizer plan doesn't correlate with experimental query runs?

2 Answers2