4

I have a Table named DEXTRACTO and I need to consult a period of time based on column F_EXTRACTO (witch is DATE format) with a BETWEEN DATE1 AND DATE2 condition (DATE1 and DATE2 can change). Here's some data of the table:

SQL> SELECT MIN(F_EXTRACTO), MAX(F_EXTRACTO), COUNT(1)
  2    FROM DEXTRACTO
  3  /

MIN(F_EXTRACTO) MAX(F_EXTRACTO)   COUNT(1)
--------------- --------------- ----------
03/01/2005      06/01/2017        13772806


SQL> SELECT COUNT(1) FROM DEXTRACTO WHERE F_EXTRACTO IS NULL
  2  /

  COUNT(1)
----------
         0

SQL> 

I want to use an index but I don't know wich approach is better. Should I use it on column F_EXTRACTO? Or should I use an index on TRUNC(F_EXTRACTO)? I know it's not a good idea use index with function, but testing both approachs I got this...

SQL> create index INDEX_DATE on DEXTRACTO (F_EXTRACTO)
  2  /

Index created

SQL> create index INDEX_TRUNC on DEXTRACTO (TRUNC(F_EXTRACTO))
  2  /

Index created

SQL> 

Testing index on F_EXTRACTO:

SQL> explain plan for
  2  
  2  SELECT /*+ index (dextracto INDEX_DATE)  */ *
  3    FROM dextracto
  4   WHERE f_extracto
  5         BETWEEN to_date('01/01/2005','dd/mm/yyyy') AND SYSDATE
  6  /

Explained

SQL> select plan_table_output from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
---------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |    12M|  1088M|   250K|
|   1 |  FILTER                      |            |       |       |       |
|   2 |   TABLE ACCESS BY INDEX ROWID| DEXTRACTO  |    12M|  1088M|   250K|
|   3 |    INDEX RANGE SCAN          | INDEX_DATE |    12M|       | 36972 |
---------------------------------------------------------------------------
Note
-----
   - 'PLAN_TABLE' is old version

13 rows selected

SQL> 

Testing index on TRUNC(F_EXTRACTO):

SQL> explain plan for
  2  
  2  SELECT /*+ index (dextracto INDEX_TRUNC) */ *
  3    FROM dextracto
  4   WHERE TRUNC(f_extracto)
  5         BETWEEN to_date('01/01/2005','dd/mm/yyyy') AND SYSDATE
  6  /

Explained

SQL> select plan_table_output from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
----------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost  |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             | 32437 |  2787K|  1130 |
|   1 |  FILTER                      |             |       |       |       |
|   2 |   TABLE ACCESS BY INDEX ROWID| DEXTRACTO   | 32437 |  2787K|  1130 |
|   3 |    INDEX RANGE SCAN          | INDEX_TRUNC | 58387 |       |   169 |
----------------------------------------------------------------------------
Note
-----
   - 'PLAN_TABLE' is old version

13 rows selected

SQL> 

So... if I use index F_EXTRACTO the cost is 250000, but if I use index TRUNC(F_EXTRACTO) the cost is 1130. Can somebody tell me why exists such a big difference between this two approachs? If you need some aditional information, please tell me.

Cero Silvestris
  • 127
  • 1
  • 1
  • 8
  • As well as gathering stats, it might be interesting to look at the predicate information too - though you might need to recreate your plan table, as it's reporting that as an old version and not showing the info now. Also, an FBI on the truncated dates is only really going to be useful if your date values have times other than midnight; and even then if the start/end dates you're searching on also have non-midnight times, and you want to include their whole days. – Alex Poole Jan 09 '17 at 16:42

2 Answers2

3

Huge difference between estimated number of rows (12M vs 58387 ) is very likely attributed to the out of date statistics. I'd suggest collecting statistics (for instance, with DBMS_STATS.gather_table_stats ) after adding indexes.

Also, EXPLAIN PLAN doesn't guarantee that the plan is what will be actually used. I'd rather run query, and then check actual execution plan with dbms_xplan.display_cursor. It also makes sense to look into v$sql/v$sqlarea views for execution details.

a1ex07
  • 36,826
  • 12
  • 90
  • 103
  • Great! I didn't know about `DBMS_STATS` and `DBMS_XPLAN`. I tried that way and, indeed, both queries show same rows (13M) and same cost (283K), yet both plans are a little different. So... now I know the reason for differents costs, and can think for a better solution. Thanks, both of you! – Cero Silvestris Jan 09 '17 at 18:52
3

I want to use an index but I don't know wich approach is better.

You'll definitively not use an index to access all 14M rows from the table (as in your examples). After gathering statistics retry the explain plan without hint and you'll se FULL TABLE SCAN access with much lower cost that the index access. From the ratio of the costs INDEX / FTS you can estimate the part of the table that is worth access via index.

To access up to few months the INDEX ACCESSs could be more effective, but above some threshold FULL SCAN will be better (check both explain plan and the execution - this could give different results).

In your use case I don't see any plus on using FBI. The negative side is the non-deterministing order and support for only the daily granularity of intervals.

Marmite Bomber
  • 19,886
  • 4
  • 26
  • 53