1

I am trying to improve the performance of the query shown below by rewriting a conditional LEFT JOIN as a UNION of INNER JOIN and the base table. I am using Oracle 12c.

The tables in concern are ASSIGNMENTS and CLASSES. The requirement is that for a given ITEM_ID value, we need to fetch details from the corresponding records in the two tables.

  • There exists only one CLASSES table record each ITEM_ID.
  • The ASSIGNMENTS table may or may not have a record for any given ITEM_ID from the CLASSES table.
  • The ASSIGNMENTS record details are required only when the CLASSES record has enable_capacity = 'Y'.

The query that I have come up with is:

WITH CLASSES_WITH_CAPACITY AS (
  SELECT maximum_attendees,  item_id,  enable_capacity
  FROM CLASSES classes WHERE enable_capacity = 'Y' AND classes.item_id = 123 
), CLASSES_WITHOUT_CAPACITY AS (
  SELECT maximum_attendees,  item_id,  enable_capacity
  FROM CLASSES classes WHERE enable_capacity is null AND item_id = 123 
) 
SELECT maximum_attendees, item_id, max_position_value, 
  enable_capacity, enable_waitlist FROM (
    (SELECT classes.maximum_attendees, classes.item_id,
      MAX(assignments.wait_position) max_position_value,
      classes.enable_capacity, classes.enable_waitlist
     FROM CLASSES_WITH_CAPACITY classes 
      JOIN WLF_ASSIGNMENT_RECORDS_F assignments ON (classes.item_id = assignments.item_id) 
      WHERE ( assignments.status <> 'EXPIRED')
      GROUP BY classes.item_id, classes.maximum_attendees, 
      classes.enable_capacity, classes.enable_waitlist )
    UNION ALL
    (SELECT classes.maximum_attendees, classes.item_id, 
      null AS max_position_value, 
      classes.enable_capacity, classes.enable_waitlist
      FROM CLASSES_WITHOUT_CAPACITY classes
    )
   );

In essence, the WITH clause statements will be mutually exclusive - that is, only one of the WITH clause queries will have the single CLASSES record with the given ITEM_ID. Below is the output of the EXPLAIN PLAN for the above query:

-------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                  |     2 |   244 |     8  (13)| 00:00:01 |
|   1 |  VIEW                                    |                  |     2 |   244 |     8  (13)| 00:00:01 |
|   2 |   UNION-ALL                              |                  |       |       |            |          |
|   3 |    HASH GROUP BY                         |                  |     1 |   111 |     6  (17)| 00:00:01 |
|   4 |     MERGE JOIN CARTESIAN                 |                  |     1 |   111 |     5   (0)| 00:00:01 |
|*  5 |      TABLE ACCESS BY INDEX ROWID         | ASSIGNMENTS      |     1 |    75 |     4   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN                   | ASSIGNMENTS_N9   |     9 |       |     1   (0)| 00:00:01 |
|   7 |      BUFFER SORT                         |                  |     1 |    36 |     2  (50)| 00:00:01 |
|*  8 |       TABLE ACCESS BY INDEX ROWID BATCHED| CLASSES          |     1 |    36 |     1   (0)| 00:00:01 |
|*  9 |        INDEX RANGE SCAN                  | CLASSES_N2       |     1 |       |     0   (0)| 00:00:01 |
|* 10 |    TABLE ACCESS BY INDEX ROWID BATCHED   | CLASSES          |     1 |    36 |     2   (0)| 00:00:01 |
|* 11 |     INDEX RANGE SCAN                     | CLASSES_N2       |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------

Now, my concern is regarding the join type being MERGE JOIN CARTESIAN. It looks like it is using the MERGE JOIN CARTESIAN for the INNER JOIN between CLASSES and ASSIGNMENTS.

My initial impression is that this won't lead to poor performance, only because the join would be performed on only one record. Is this correct?

What is the reason for the use of the Cartesian Join Type at all?

Given the fact that the Cartesian Join type is used when the JOIN predicates are missing and/or the joined predicates do not meet the proper primary-foreign key relationships, is it that the Optimizer does not find any indexes on the CLASSES alias? That is, is the use of the WITH clause obscuring away the indexes present in the table inside the WITH clause when joining the WITH Clause further?

If so, should the use of WITH clause be discouraged - especially when the output of the WITH clause is being joined further?

Also, any other suggestions on my approach are also welcome. Thanks!

Sarath Chandra
  • 1,850
  • 19
  • 40
  • Yes, the Oracle impression is that the `MERGE JOIN CARTESIAN`will be performed with one row. See the corresponding `Rows` column. Also all tables are accesed with `INDEX RANGE SCAN` (lines 6, 9, 11). If you look at the Predicate section of the execution plan (that is cut off) you will see the access and filter predicates. See [here](https://stackoverflow.com/questions/34975406/how-to-describe-performance-issue-in-relational-database?answertab=votes#tab-top) how to get full execution plan. – Marmite Bomber Nov 16 '18 at 06:29
  • 1
    You get no value from the WITH clauses here. You could simplify the query immensely by just having both sides of the UNION ALL operator query directly from CLASSES. – APC Nov 16 '18 at 06:58
  • @APC Do you mean SELECT ... FROM ASSIGNMENTS JOIN CLASSES where enable_capacity = 'Y' UNION ALL SELECT ... FROM CLASSES where enable_capacity IS NULL? – Sarath Chandra Nov 16 '18 at 08:33
  • Yes that's what I mean – APC Nov 16 '18 at 09:04
  • @APC Thanks a lot. It does make sense. I would still like to know if using the WITH clause has an impact on the join. – Sarath Chandra Nov 16 '18 at 09:37
  • 1
    The amount of data you're working with is too small to draw any conclusions. But generally, the answer is *it depends*. WITH clauses can be very performative when used properly e.g. when a subquery is used more than once or to handle recursion. But they can also have negative impact, say if the optimizer decides to materialize them and that turns out to be a bad option. So the better practice is: understand what WITH clause does and only use it when appropriate to do so. – APC Nov 16 '18 at 09:43
  • @APC understood. Any specific further reading resources/links that you recommend for this to get better conceptual understanding? – Sarath Chandra Nov 16 '18 at 09:47

1 Answers1

0

Why is used CARTESIAN JOIN?

As you stated there is at most one record in the table CLASSES with a given item_id. Oracle realizes this (see colum Rows in the line 9 of the execution plan) and decides to make a cartesian join of this one row with the expected 9 rows of the table ASSIGNMENTS.

This works fine if the cardinality estimation is fine, but can cause big troubles if it is inaccurate. I suppose that the item_id is not backed by a primary key or a unique index - see the INDEX RANGE SCAN in line 9; I'd expect INDEX UNIQUE SCAN in case of unique index. So this is a potential cause of problems.

Usage of WITH clauses

As mentioned by others the WITH clauses in your query has no effect, but also makes no harm. As you see in the execution plan they are merged in the main query. Oracle is re-writing the query and eliminates both subqueries.

Improve the Performance

You din't stated what problems you tried to approach by eliminating LEFT OUTER JOIN; anyway there could be a subtle point where some care should be taken.

Basically there are two possibilities to execute the plan

1) join first and than aggregate to get the MAX(wait_position)

2) get the MAX value first and than join (trivial) the two one row tables

If the table ASSINMENT has only imited number of rows with the given ITEM_ID there is practically no difference between those two options.

The troubles starts if you encounters an assignment with tons of rows for some ITEM_ID. This is sometimes called as a dying in the NESTED LOOPS as you loop on millions of rows only after that aggregate the result to one row containing the MAX.

Oracle has for this case an index access INDEX RANGE SCAN (MIN/MAX) which get the maximum value direct from the index (without accessing the table and without scanning all the values - remember index is sorted, so get MAX value is trivial).

So under the assumtion of those indexs, you may try the alternative query below

unique index on classes(item_id) index on WLF_ASSIGNMENT_RECORDS_F(item_id, wait_position)

The second index is important as you will bypass the table at all and you it to the the max(wait_position)

The query (simplified by ommiting some columns) is as follows:

with max_wait as (
select  max(a.wait_position) max_wait_position
from assignments a
where a.item_id = 1)
select c.item_id, m.max_wait_position
from classes c
left outer join max_wait m
on  c.enable_capacity = 'Y'
where c.item_id = 1;

Note that the subquery calculates the max(wait_position). In the next step you outer joins the table to the subquery, but only if c.enable_capacity = 'Y' - no further predicate is needed as both row sources has maximal one row.

Execution plan is very effective consisting of tow index access and one table block access.

----------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                 |     1 |    18 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS OUTER              |                 |     1 |    18 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID    | CLASSES         |     1 |     5 |     1   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN             | CLASSES_UX1     |     1 |       |     0   (0)| 00:00:01 |
|   4 |   VIEW                           | VW_LAT_1D42B1AA |     1 |    13 |     2   (0)| 00:00:01 |
|*  5 |    FILTER                        |                 |       |       |            |          |
|   6 |     VIEW                         |                 |     1 |    13 |     2   (0)| 00:00:01 |
|   7 |      SORT AGGREGATE              |                 |     1 |     9 |            |          |
|   8 |       FIRST ROW                  |                 |     1 |     9 |     2   (0)| 00:00:01 |
|*  9 |        INDEX RANGE SCAN (MIN/MAX)| ASSIGNMENTS_IX1 |     1 |     9 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("C"."ITEM_ID"=1)
   5 - filter("C"."ENABLE_CAPACITY"='Y')
   9 - access("A"."ITEM_ID"=1)

I like this example as a demonstration, that a simple and compact written query leads to an effective execution plan.

Final remark, I hope in your production environment you use bind variables and not literal keys for the item_id;)

Marmite Bomber
  • 19,886
  • 4
  • 26
  • 53