1

I am not a query performance expert and I am learning how Oracle optimizer works on different queries and tune them for its use. Below is such query from my project where I am stuck on optimizing it for the large data set (it's slowing down for large dataset).

SELECT
     v1.id,
     v1.date_created,
     v1.name,
     v1.size
 FROM
      ver v1
     INNER JOIN (
         SELECT
             id,
             MAX(date_created) AS last_date_created
         FROM
             ver
         WHERE
             id IN (
                 ...500 ids
             )
             AND active = 'Y'
             AND archived = 'N'
         GROUP BY
             id
     ) v2 ON v1.date_created = v2.last_date_created
             AND v1.id = v2.id

I tried the SQL developer query tuning advisor, no recommendation. The problem here is it's going for full table scan on both parts of the query and not using any index and ver table contain nearly 1M records. Below is ver table script

create table ver 
 (  "Ver_id" VARCHAR2(36 BYTE) Primary key
    "NAME" VARCHAR2(255 BYTE) 
    "ACTIVE" VARCHAR2(1 BYTE) 
    "ARCHIVED" VARCHAR2(1 BYTE) 
    "DESCRIPTION" VARCHAR2(255 BYTE), 
    "ID" VARCHAR2(36 BYTE) 
    "DATE_CREATED" NUMBER(*,0)
    "CREATED_BY_USER" VARCHAR2(64 BYTE) 
    "SIZE" NUMBER(*,0)
    "LAST_MODIFIED" NUMBER(*,0))

and indexes are one nonunique index on id and one unique index on (id,name) and non unique on last_modified.

The query is taking nearly 2-3min for its execution now. Any suggestions on it.

witswickey
  • 35
  • 1
  • 2

2 Answers2

1

If you select only 2 to 3 K rows from a 1M row table you may profit from using index.

You basically wants to a) select all rows with a defined list of IDs and b) filter for each IDonly the record(s) with MAX date_created.

You only need an index on ID

create index ver_idx on ver(id);

Here are two alternative strategies that can be used:

Use analytic functions to get the recent row

In the subquery you get all rows with the specified IDs and using RANK analytic function you defines the order of the rows. Teh main query selects only rows with rn = 1, i.e. with the max(date_created).

Note that I use RANK to get the same result as your query. If there are ties on the max datum you get more records. You may use ROW_NUMBER if you want only one record even for ties.

with dt as (
select 
 id,date_created, name, "SIZE",
rank() over (partition by id order by date_created desc) rn
from ver
where id between 1 and 500
AND active = 'Y'
AND archived = 'N')
select 
  id,date_created, name, "SIZE"
from dt
where rn = 1;

You use the index to get all rows with selected IDs with additional filters to get only the rows with max date.

Use Correlated Subquery

You use correlated subquery to filter the rows with max date:

select 
 id,date_created, name, "SIZE"
from ver a
where id between 1 and 500  
AND active = 'Y'
AND archived = 'N'
AND date_created in (select max(date_created) 
                     from ver where id = a.id and active = a.active 
                              and archived = a.archived)

It is not possible to say which approach is the best. It depends on the data in your table.

Simple test, check the execution plans and find the best performing query.

Marmite Bomber
  • 19,886
  • 4
  • 26
  • 53
0

first of all you don't need to have index on id since you already have index on (id,name) pair.

you can have Btree index on ver(id, date_created) and bitmap index on ver(active,Archived)

you are also not using v2 in SELECT statement, thus the above query can be rewritten as

SELECT
     v1.id,
     v1.date_created,
     v1.name,
     v1.size
 FROM
      ver v1
 WHERE EXISTS (
     SELECT 1
     FROM ver v2
     WHERE v1.id = v2.id 
       and v2.id IN (
            ...500 ids
       )
       AND v2.active = 'Y'
       AND v2.archived = 'N'
     GROUP BY v2.id
     HAVING MAX(v2.date_created) = v1.date_created
 )
Derviş Kayımbaşıoğlu
  • 28,492
  • 4
  • 50
  • 72
  • Hi Simonare , Thanks for the quick response, how do we select which kind of index will be suitable for a problem? newbie for this and hence looking for performance guide for these problems. – witswickey Feb 21 '19 at 13:43
  • Bitmap index is used when your column data is not changing too often. such as Yes, No or isArchived=> 1, 0. Btree is for others such as username. – Derviş Kayımbaşıoğlu Feb 21 '19 at 13:46
  • @Simonare The problem with bitmap indexes isn't whether the data is relatively static - it's whether the data is updated by multiple sessions concurrently. And, don't forget, you have to worry about more than concurrent updates on the bitmap-indexed column: concurrent inserts and deletes on the table will also lock bitmap entries, causing serialization and deadlocks. – Matthew McPeak Feb 21 '19 at 16:17
  • @witswickey if you feel that this answer is useful for you, please consider to mark it answer – Derviş Kayımbaşıoğlu Feb 21 '19 at 17:11