1

I have two tables A & B. A has 6,760,636 records and B has 452,175,960 records. Here is the SELECT Statement I'm using:

SELECT  /*+ parallel (T,1) */
         T.*
     FROM     TABLE_A T,
              TABLE_B P 
    WHERE     T.DESTINATION = P.DESTINATION 
          AND T.SAIL_DATE = P.SAIL_DATE 
          AND T.PACKAGE_TYPE = P.PACKAGE_TYPE 
          AND T.CABIN_CATEGORY = P.CABIN_CATEGORY 
          AND T.BOOKING_SOURCE = P.BOOKING_SOURCE 
          AND T.FARE_TYPE = P.FARE_TYPE 
          AND T.POST_DATE = P.POST_DATE;

I tried creating the index on TABLE_A, but still it is not considering the INDEX and doing a FULL TABLE SCAN.

The EXPLAIN PLAN for above is

| Id  | Operation                    | Name                      | Rows  | Bytes | Cost (%CPU)|    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                           |  6760K|  1805M|   747K  (1)|        |      |            |
|   1 |  SORT AGGREGATE              |                           |     1 |    48 |            |        |      |            |
|   2 |   TABLE ACCESS BY INDEX ROWID| CL_PRICING_CONTROLS       |     1 |    48 |     2   (0)|        |      |            |
|   3 |    INDEX RANGE SCAN          | CL_PRICING_CONTROLS_IX3   |     1 |       |     1   (0)|        |      |            |
|   4 |  PX COORDINATOR              |                           |       |       |            |        |      |            |
|   5 |   PX SEND QC (RANDOM)        | :TQ10002                  |  6760K|  1805M|   747K  (1)|  Q1,02 | P->S | QC (RAND)  |
|   6 |    HASH JOIN                 |                           |  6760K|  1805M|   747K  (1)|  Q1,02 | PCWP |            |
|   7 |     PX RECEIVE               |                           |  6760K|  1437M|  1443   (1)|  Q1,02 | PCWP |            |
|   8 |      PX SEND HASH            | :TQ10001                  |  6760K|  1437M|  1443   (1)|  Q1,01 | P->P | HASH       |
|   9 |       PX BLOCK ITERATOR      |                           |  6760K|  1437M|  1443   (1)|  Q1,01 | PCWC |            |
|  10 |        TABLE ACCESS FULL     | TMP_RES_PRICE_CONTROL_111 |  6760K|  1437M|  1443   (1)|  Q1,01 | PCWP |            |
|  11 |     BUFFER SORT              |                           |       |       |            |  Q1,02 | PCWC |            |
|  12 |      PX RECEIVE              |                           |   450M|    23G|   746K  (1)|  Q1,02 | PCWP |            |
|  13 |       PX SEND HASH           | :TQ10000                  |   450M|    23G|   746K  (1)|        | S->P | HASH       |
|  14 |        INDEX FULL SCAN       | CL_PRICING_CONTROLS_IX1   |   450M|    23G|   746K  (1)|        |      |            |

I tried parallel query on both tables as below

SELECT  /*+ PARALLEL(T, 32)  PARALLEL(P, 32)*/
         T.*
     FROM     TABLE_A T,
              TABLE_B P 
    WHERE     T.DESTINATION = P.DESTINATION 
          AND T.SAIL_DATE = P.SAIL_DATE 
          AND T.PACKAGE_TYPE = P.PACKAGE_TYPE 
          AND T.CABIN_CATEGORY = P.CABIN_CATEGORY 
          AND T.BOOKING_SOURCE = P.BOOKING_SOURCE 
          AND T.FARE_TYPE = P.FARE_TYPE 
          AND T.POST_DATE = P.POST_DATE;

But this results in the EXPLAIN PLAN as below, where I see CPU with lower Cost, but still it is taking two hours.

| Id  | Operation                    | Name                      | Rows  | Bytes | Cost (%CPU)|    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                           |  6760K|  1805M| 59345   (1)|        |      |            |
|   1 |  SORT AGGREGATE              |                           |     1 |    48 |            |        |      |            |
|   2 |   TABLE ACCESS BY INDEX ROWID| CL_PRICING_CONTROLS       |     1 |    48 |     2   (0)|        |      |            |
|   3 |    INDEX RANGE SCAN          | CL_PRICING_CONTROLS_IX3   |     1 |       |     1   (0)|        |      |            |
|   4 |  PX COORDINATOR              |                           |       |       |            |        |      |            |
|   5 |   PX SEND QC (RANDOM)        | :TQ10002                  |  6760K|  1805M| 59345   (1)|  Q1,02 | P->S | QC (RAND)  |
|   6 |    HASH JOIN BUFFERED        |                           |  6760K|  1805M| 59345   (1)|  Q1,02 | PCWP |            |
|   7 |     PX RECEIVE               |                           |  6760K|  1437M|  1443   (1)|  Q1,02 | PCWP |            |
|   8 |      PX SEND HASH            | :TQ10000                  |  6760K|  1437M|  1443   (1)|  Q1,00 | P->P | HASH       |
|   9 |       PX BLOCK ITERATOR      |                           |  6760K|  1437M|  1443   (1)|  Q1,00 | PCWC |            |
|  10 |        TABLE ACCESS FULL     | TMP_RES_PRICE_CONTROL_111 |  6760K|  1437M|  1443   (1)|  Q1,00 | PCWP |            |
|  11 |     PX RECEIVE               |                           |   450M|    23G| 57858   (1)|  Q1,02 | PCWP |            |
|  12 |      PX SEND HASH            | :TQ10001                  |   450M|    23G| 57858   (1)|  Q1,01 | P->P | HASH       |
|  13 |       PX BLOCK ITERATOR      |                           |   450M|    23G| 57858   (1)|  Q1,01 | PCWC |            |
|  14 |        TABLE ACCESS FULL     | CL_PRICING_CONTROLS       |   450M|    23G| 57858   (1)|  Q1,01 | PCWP |            |
APC
  • 144,005
  • 19
  • 170
  • 281
ADT
  • 43
  • 5
  • 2
    What is your question? – Matheno Nov 26 '15 at 15:38
  • 3
    When you were asking your question, there was a big orange **How to Format** box to the right of the text area with useful information in it. There was also an entire toolbar of formatting aids. And a **[?]** button giving formatting help. *And* a preview area located between the text area and the Post Your Question button (so that you'd have to scan past it to find the button) showing what your post would look like when posted. Making your post clear, and demonstrating that you took the time to do so, improves your chances of getting good answers. – T.J. Crowder Nov 26 '15 at 15:43
  • Are those `table` or `views` ? Try creating a composite index `CREATE INDEX comp_ind ON table1(DESTINATION , SAIL_DATE, PACKAGE_TYPE, CABIN_CATEGORY , BOOKING_SOURCE, FARE_TYPE, POST_DATE)`;` on both tables – Juan Carlos Oropeza Nov 26 '15 at 15:59
  • 1
    You have a query that is potentially returning billions of rows. A couple of hours is not an inordinate amount of time for such a query. – Gordon Linoff Nov 26 '15 at 16:11
  • I have added the index for Table A, and tried the same, and when observed the Explain Plan i could see that index is not been used, I also have done the DBMS_STATS.GATHER_TABLE_STATS for Table A. But no use – ADT Nov 26 '15 at 17:18
  • Why change the table names in the SELECT statement if you're not going to change them in the EXPLAIN PLAN too? You're just making it harder for us to understand your problem. – APC Dec 04 '15 at 06:11
  • Is that really the full query? I don't see anything in the query that would generate a `SORT AGGREGATE`. Other than that, the plan actually looks pretty good for these kinds of queries. Perhaps the problem is that there aren't parallel servers available and the query has been downgraded? Troubleshooting parallel queries can be painful. I recommend you go through the steps I listed in [this answer](http://stackoverflow.com/a/21132027/409172). – Jon Heller Dec 04 '15 at 08:08

1 Answers1

1

Please look at your query, specifically at its WHERE clause. It consists entirely of join conditions. It could be rewritten using the ANSI-92 syntax to this:

SELECT  /*+ parallel (T,1) */
         T.*
FROM     TABLE_A T
          inner join TABLE_B P 
on     T.DESTINATION = P.DESTINATION 
      AND T.SAIL_DATE = P.SAIL_DATE 
      AND T.PACKAGE_TYPE = P.PACKAGE_TYPE 
      AND T.CABIN_CATEGORY = P.CABIN_CATEGORY 
      AND T.BOOKING_SOURCE = P.BOOKING_SOURCE 
      AND T.FARE_TYPE = P.FARE_TYPE 
      AND T.POST_DATE = P.POST_DATE;

So every row in each table has to be considered. Clearly full table scans are the only plausible access path.

Building a composite index on all columns in the JOIN clause is unlikely to change this. You are selecting all the columns from TABLE_A, so the database needs to visit the table anyway.

Unless the number of records in the intersection between the two tables is incredibly small it will be more efficient to read the tables in multi-block reads rather than index scans with table row look-ups. As it is, you're selecting almost one-in-six of all the rows in a half-billion row table. How would an index make that faster?


Incidentally, how did you decide on the degree of parallelism? How many CPUs does your server have? What other processes are running concurrently? What is the value of MAX_PARALLEL_SERVERS? Find out more.

APC
  • 144,005
  • 19
  • 170
  • 281
  • In addition to @APC answer: the explain plan estimates that the query will return 6M rows. Are you just spooling these rows to the screen in sqlplus? If so, that could be dominating your runtime! Otherwise, what are you doing with these rows. – BobC Jan 09 '17 at 21:56