0

I have a table with about 18 million records. I have to read the whole table in one query because all of the records were created on one date. So, firstly I defined an index on four columns that I have a query on them using the below command:

CREATE INDEX test_etl_index ON test_table(t_date,c_num,obc,bu);

After creating indexes, I must use this query for every columns of c_num,obc,bu. The query is below:

    SELECT t_date,
       c_num,
       pd,
       pds,
       uc,
       obc,
       t_id,
       da,
       ca,
       db,
       t_time,
       ibc,
       lc,
       lt,
       sts,
       wd,
       bu
FROM   test_table
WHERE  t_date = '20170628'
       AND c_num IN (SELECT KEY
                     FROM   c_g
                     WHERE  g_id = 1);

But, for each column, the query takes about 8 minutes which is very very slow!

Would you please guide me on how I can change the query to get better performance? Any help would be appreciated.

M_Gh
  • 1,046
  • 4
  • 17
  • 43
  • What if you change that `in (select ...)` to an INNER JOIN? – Hans Kesting Jan 26 '22 at 15:26
  • The same impression for me, maybe Hash Join will help here. – loic Jan 26 '22 at 15:28
  • 2
    Please don't store dates as strings. Makes me cringe. – OldProgrammer Jan 26 '22 at 15:39
  • *Before* you start creating *indexes* check (and post) your execution [plan](https://stackoverflow.com/a/34975420/4808122). Most probably some index is your *problem*. – Marmite Bomber Jan 26 '22 at 15:58
  • Index will probably not be useful in this query because of skip scans (you filter not by leading column, so it will require to read all the data in the index). Maybe separate indexes for each column may help here, if you will use a single query with filter for each column combined via `OR` – astentx Jan 26 '22 at 15:59
  • How many rows from your 18M do you expect to get for the one particular "date" you filter? – Marmite Bomber Jan 26 '22 at 16:38
  • "date" (actually "DATE") is a reserved word, and thus not a valid column name. And it should be of type DATE, so you are either comparing a DATE to a string (bad) or you have made that column an incorrect data type (worse). – EdStevens Jan 26 '22 at 16:50
  • Thank you all for your comments. Sorry, I had mistake when writing my question, I correct name of *Date* column. Moreover, I have to define type of that column **varchar**, since it does not have just one date type; but keeps different date format, for example Arabic. – M_Gh Jan 29 '22 at 14:32
  • Also, I have tested the query with **JOIN**, but the time is not so different from this query. – M_Gh Jan 29 '22 at 15:11

4 Answers4

2

Close to no advice can be done based on the information you provided.

Except - as commended - to fix the data type for the DATE column, as the storage of dates as strings can realy confuse the optimizer.

The expected setup dependes on your data, here are some hints.

The DATE colums is Selective

In case your predicate date='20170628' (or better col_date = date'2017-06-28) returns only very few records you'll profite from an index on this column.

create index test_table_idx on test_table(col_date);

You can expect an execution plan as follows

-------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                |     4 |   472 |     5   (0)| 00:00:01 |
|*  1 |  HASH JOIN SEMI                      |                |     4 |   472 |     5   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| TEST_TABLE     |    10 |  1120 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | TEST_TABLE_IDX |    10 |       |     1   (0)| 00:00:01 |
|*  4 |   TABLE ACCESS FULL                  | C_G            |     3 |    18 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("C_NUM"="KEY")
   3 - access("COL_DATE"=TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   4 - filter("G_ID"=1)

Note that Oracle rewrite your in (subquery) in a hash join semi, so no manual query rewrite is required.

C_NUM is Selective

If on the contrary the predicate c_num in (... returns very few records, define an index on c_num column.

create index test_table_idx2 on test_table(c_num);

You can expect an execution plan as follows

------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |     4 |   480 |    28   (4)| 00:00:01 |
|   1 |  NESTED LOOPS                |                 |     4 |   480 |    28   (4)| 00:00:01 |
|   2 |   NESTED LOOPS               |                 |    20 |   480 |    28   (4)| 00:00:01 |
|   3 |    SORT UNIQUE               |                 |     3 |    18 |     3   (0)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL        | C_G             |     3 |    18 |     3   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | TEST_TABLE_IDX2 |    10 |       |     2   (0)| 00:00:01 |
|*  6 |   TABLE ACCESS BY INDEX ROWID| TEST_TABLE      |     1 |   114 |    12   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - filter("G_ID"=1)
   5 - access("C_NUM"="KEY")
   6 - filter("COL_DATE"=TO_DATE(' 2000-01-02 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

No Selectivity

If neither of the above hold, forgot indexes and you should see an HASH JOIN SEMI that should not take as much time on a 18M table

-----------------------------------------------------------------------------------
| Id  | Operation            | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |            | 27273 |  3142K|  4516   (1)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT SEMI|            | 27273 |  3142K|  4516   (1)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL  | C_G        |     3 |    18 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL  | TEST_TABLE | 90909 |  9943K|  4512   (1)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("C_NUM"="KEY")
   2 - filter("G_ID"=1)
   3 - filter("COL_DATE"=TO_DATE(' 2000-01-02 00:00:00', 'syyyy-mm-dd 
              hh24:mi:ss')) 

The important point is learn how to get the execution plan of the query, how to read it and how to understand where is the bottelneck.

Marmite Bomber
  • 19,886
  • 4
  • 26
  • 53
  • Dear @Marmite Bomber, thank you for your answer. I defined Index just on (t_date and c_num), but the query run as slow as before. – M_Gh Jan 29 '22 at 15:46
1

create index on date and c_num columns only rather than (date,c_num,obc,bu) Or create another index on date and c_num and call this idx2

  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jan 26 '22 at 16:58
0

Problem solved.

I create INDEX on columns separately, like this:

 CREATE INDEX t_date_inx ON test_table(t_date);
 CREATE INDEX c_num_inx ON test_table(c_num);

Then run the query. It run faster.

 SELECT t_date,
   c_num,
   pd,
   pds,
   uc,
   obc,
   t_id,
   da,
   ca,
   db,
   t_time,
   ibc,
   lc,
   lt,
   sts,
   wd,
   bu
   FROM   test_table
   WHERE  t_date = '20170628'
       AND c_num IN (SELECT KEY
                 FROM   c_g
                 WHERE  g_id = 1);

Hope it was useful for others.

M_Gh
  • 1,046
  • 4
  • 17
  • 43
-1

use exists instead of in()

WHERE t_date = '20170628' AND exists(select 'x' FROM c_g WHERE g_id = 1 and KEY = test_table.c_num )