3

I am trying to see how to improve performance for aggregation queries in an Oracle database. The system is used to run financial series simulations.

Here is the simplified set-up:

  1. The first table table1 has the following columns

    date | id | value

    It is read-only, has about 100 million rows and is indexed on id, date

  2. The second table table2 is generated by the application according to user input, is relatively small (300K rows) and has this layout:

    id | start_date | end_date | factor

After the second table is generated, I need to compute totals as follows:

    select date, sum(value * nvl(factor,1)) as total
    from table1 
    left join table2 on table1.id = table2.id 
    and table1.date between table2.start_date and table2.end_date group by date 

My issue is that this is slow, taking up to 20-30 minutes if the second table is particularly large. Is there a generic way to speed this up, perhaps trading off storage space and execution time, ideally, to achieve something running in under a minute?

I am not a database expert and have been reading Oracle performance tuning docs but was not able to find anything appropriate for this. The most promising idea I found were OLAP cubes but I understand this would help only if my second table was fixed and I simply needed to apply different filters on the data.

user3124206
  • 375
  • 1
  • 7
  • 16

2 Answers2

2

First, to provide any real insight, you'd need to determine the execution plan that Oracle is producing for the slow query.

You say the second table is ~300K rows - yes that's small compared to 100M but since you have a range condition in the join between the two tables, it's hard to say how many rows from table1 are likely to be accessed in any given execution of the query. If a large proportion of the table is accessed, but the query optimizer doesn't recognize that, the index may actually be hurting instead of helping.

You might benefit from re-organizing table1 as an index-organized table, since you already have an index that covers most of the columns. But all I can say from the information so far is that it might help, but it might not.

Dave Costa
  • 47,262
  • 8
  • 56
  • 72
  • You can find some references about determining the execution plan here: http://stackoverflow.com/questions/14596088/getting-actual-execution-plan-in-oracle-11g – Dave Costa Dec 21 '13 at 03:37
  • Many thanks for your suggestions, I will post the plan as soon as I have access to the system, I am on holidays right now. A row from table2 accesses about 3 months worth of daily data from table1. I will try to re-organize table1 as an index-organized table and check the impact on performance – user3124206 Dec 26 '13 at 18:28
1

Apart from indexes, Also try below. My two cents!

  1. Try running this Query with PARALLEL option employing multiple processors. /*+ PARALLEL(table1,4) */ .

  2. NVL has been done for million of rows, and this will be an impact to some extent, any way data can be organised?

  3. When you know the date in Advance, probably you divide this Query into two chunks, by fetching the ids in TABLE2 using the start date and end date. And issue a JOIN it to TABLE1 using a view or temp table. By this we use the index (with id as leading edge) optimally

Thanks!

Maheswaran Ravisankar
  • 17,652
  • 6
  • 47
  • 69
  • Concerning `PARALLEL` - this should be used with the great care. And it should be used when there are no other options that can make query faster. [Just a quote from Tom Kyte's site](http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:39946845137685): `Parallel query is suitable for a certain class of large problems: very large problems that have no other solution. Parallel query is my last path of action for solving a performance problem; it's never my first course of action.` – Yaroslav Shabalin Dec 22 '13 at 12:45