0

I have Oracle database with a main table contain 9 000 000 rows and a second with 19 000 000 rows.

When I do :

SELECT *
FROM main m
INNER JOIN second s ON m.id = s.fk_id AND s.cd = 'E' AND s.line = 1

It's take 45 seconds to get the first part of the result, even with all the index below :

CREATE INDEX IDX_1 ON SECOND (LINE, CD, FK_ID, ID);
CREATE INDEX IDX_1 ON SECOND (LINE, CD);
MAIN (ID) AS PRIMARY KEY

Any idea how to do it faster ? I try some index, rebuild but it's always take 45 seconds

Here is the execution plan :

------------------------------------------------------------------------------------------------
| Id  | Operation            | Name                 | Rows    | Bytes      | Cost   | Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                      | 8850631 | 2133002071 | 696494 | 00:00:28 |
| * 1 |   HASH JOIN          |                      | 8850631 | 2133002071 | 696494 | 00:00:28 |
| * 2 |    TABLE ACCESS FULL | SECOND               | 8850631 |  646096063 | 143512 | 00:00:06 |
|   3 |    TABLE ACCESS FULL | MAIN                 | 9227624 | 1550240832 | 153363 | 00:00:06 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 1 - access("M"."ID"="S"."FK_ID")
* 2 - filter("S"."CD"='D' AND "S"."LINE"=1)

Thanks

Bosshoss
  • 783
  • 6
  • 24
  • Is it possible for your data to add a unique(or primary) key for those four columns rather than this current index ? If possible, probably you'll able to get a performance gain. – Barbaros Özhan Nov 02 '20 at 15:50
  • 2
    Post the query plan. How selective are your predicates? – Justin Cave Nov 02 '20 at 15:55
  • Well if only one table is indexed, you will have to `full table scan` the other - this explaind our 45 seconds. If you want to see the **first record fast** you must have index on `second` table to access `s.cd = 'E' AND s.line = 1` and index on `main` to access on the join columns. Please see [here](https://stackoverflow.com/a/34975420/4808122) the additional information you should provide. – Marmite Bomber Nov 02 '20 at 15:59
  • *Important to know* - how many rows match the predicate `s.cd = 'E' AND s.line = 1`? How many rows do you expect the join will return? – Marmite Bomber Nov 02 '20 at 16:05
  • Looks to me that `fk_id` is more selective than the other columns; if that's the case, then it should be placed first in the index, as in: `second (fk_id, cd, line)`. – The Impaler Nov 02 '20 at 16:21
  • @TheImpaler - this index will not help much with the predicate `s.cd = 'E' AND s.line = 1` (other than with some `Index Skip Scan` access that will not help too much if there are lot of `line`s). – Marmite Bomber Nov 02 '20 at 17:12
  • 1
    @MarmiteBomber Unfortunately I agree. – The Impaler Nov 02 '20 at 17:24
  • If `fk_id` is first in the index then you would be relying on join condition to obtain the value you're driving the index. This will be every different value in `main`.`id` (there's no filters on `main`), this looks like a foreign key relation so you will end up reading all over the index to cover every single possible value of `second`.`fk_id`. This will be slower than just using the equality conditions that exist against the existing first two columns of the index - these will allow only the part of the index that matches those filters to be read - most likely a lot less than the alternative – Andrew Sayer Nov 02 '20 at 20:07

1 Answers1

2

If you want to see the first line quickly you have to enable Oracle to use the NESTED LOOP join.

This will required an index on second with the two columns you constraint in your query and an index on main on the join column id

create index second_idx on second(line,cd);
create index main_idx on main(id);

You'll see an execution plan similar to one below

--------------------------------------------------------------------------------------------
| Id  | Operation                     | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |            |    87 |  8178 |   178   (0)| 00:00:03 |
|   1 |  NESTED LOOPS                 |            |       |       |            |          |
|   2 |   NESTED LOOPS                |            |    87 |  8178 |   178   (0)| 00:00:03 |
|   3 |    TABLE ACCESS BY INDEX ROWID| SECOND     |    87 |  2523 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SECOND_IDX |     1 |       |     3   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN           | MAIN_IDX   |     1 |       |     1   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | MAIN       |     1 |    65 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - access("S"."LINE"=1 AND "S"."CD"='E')
   5 - access("M"."ID"="S"."FK_ID")

You will access via index all rows in second with requested lineand cd (plan line 4 and 3) and for each such row you'll access via index the main table (lines 5 and 6)

This will provide an instant access to the first few rows and will work fine if there are a low number of rows in second table with the selected line and cd. In other case (when there is a large number of rows with s.cd = 'E' AND s.line = 1 - say 10k+) you will still see the first result rows quickly, but you'll wait ages to see the last row (it will take much more that the 45 seconds to finish the query).

If this is a problem you have to use a HASH JOIN (which you probaly do now).

A hash join typically doesn not use indexes and produced following execution plan

-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        | 10182 |  1153K|   908   (1)| 00:00:11 |
|*  1 |  HASH JOIN         |        | 10182 |  1153K|   908   (1)| 00:00:11 |
|*  2 |   TABLE ACCESS FULL| SECOND | 10182 |    99K|   520   (2)| 00:00:07 |
|   3 |   TABLE ACCESS FULL| MAIN   | 90000 |  9316K|   387   (1)| 00:00:05 |
-----------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("M"."ID"="S"."FK_ID")
   2 - filter("S"."LINE"=1 AND "S"."CD"='E')

Summary

To use the nested loops the indexes must be available as described above

The switch between nested loopsand hash join is done by the Oracle database (CBO) - provided that your tables statistics and database configuration are fine.

Marmite Bomber
  • 19,886
  • 4
  • 26
  • 53
  • Thanks, the result of the query return 9 229 434 rows. I add the execution plan to the main question. I can't create the 2 index of specified because colums are already indexed.. – Bosshoss Nov 03 '20 at 06:40
  • @Bosshoss to join 8.8 M rows from `second` with a 9.2 M rows from `main` the `hash join` with `full table scan` (as in your posted execution plan) is the preferable option. You can speed up a bit using the *parallel* option. – Marmite Bomber Nov 03 '20 at 06:50