Speeding up aggregations for a large table in Oracle

Question

I am trying to see how to improve performance for aggregation queries in an Oracle database. The system is used to run financial series simulations.

Here is the simplified set-up:

The first table table1 has the following columns

date | id | value

It is read-only, has about 100 million rows and is indexed on id, date
The second table table2 is generated by the application according to user input, is relatively small (300K rows) and has this layout:

id | start_date | end_date | factor

After the second table is generated, I need to compute totals as follows:

    select date, sum(value * nvl(factor,1)) as total
    from table1 
    left join table2 on table1.id = table2.id 
    and table1.date between table2.start_date and table2.end_date group by date

My issue is that this is slow, taking up to 20-30 minutes if the second table is particularly large. Is there a generic way to speed this up, perhaps trading off storage space and execution time, ideally, to achieve something running in under a minute?

I am not a database expert and have been reading Oracle performance tuning docs but was not able to find anything appropriate for this. The most promising idea I found were OLAP cubes but I understand this would help only if my second table was fixed and I simply needed to apply different filters on the data.

Are these tables indexed at all? Especially the one with so many rows. — DROP TABLE users, Dec 20 '13 at 22:58
Which column is used for `as_of_date` and what are the `GROUP BY` columns? — MT0, Dec 20 '13 at 23:18
More information needs to be gathered before anyone can provide meaningful advice. A good place to start is with an explain plan. For example, run your query with `explain plan for ...` at the front of it, then run `select * from table(dbms_xplan.display);`, and post the entire results here. — Jon Heller, Dec 21 '13 at 05:23
You don't specify any where condition, so Oracle has to do a full table scan anyway. For 100M rows it will take some time in any case. — Wernfried Domscheit, Dec 21 '13 at 10:14
@MTO: I fixed the typos in the query adding the group by clause. as_of_date stood for date. Sorry for that. — user3124206, Dec 26 '13 at 18:26

score 2 · Answer 1 · answered Dec 21 '13 at 03:36

First, to provide any real insight, you'd need to determine the execution plan that Oracle is producing for the slow query.

You say the second table is ~300K rows - yes that's small compared to 100M but since you have a range condition in the join between the two tables, it's hard to say how many rows from table1 are likely to be accessed in any given execution of the query. If a large proportion of the table is accessed, but the query optimizer doesn't recognize that, the index may actually be hurting instead of helping.

You might benefit from re-organizing table1 as an index-organized table, since you already have an index that covers most of the columns. But all I can say from the information so far is that it might help, but it might not.

You can find some references about determining the execution plan here: http://stackoverflow.com/questions/14596088/getting-actual-execution-plan-in-oracle-11g — Dave Costa, Dec 21 '13 at 03:37
Many thanks for your suggestions, I will post the plan as soon as I have access to the system, I am on holidays right now. A row from table2 accesses about 3 months worth of daily data from table1. I will try to re-organize table1 as an index-organized table and check the impact on performance — user3124206, Dec 26 '13 at 18:28

score 1 · Answer 2 · answered Dec 20 '13 at 23:32

1

Apart from indexes, Also try below. My two cents!

Try running this Query with PARALLEL option employing multiple processors. /*+ PARALLEL(table1,4) */ .
NVL has been done for million of rows, and this will be an impact to some extent, any way data can be organised?
When you know the date in Advance, probably you divide this Query into two chunks, by fetching the ids in TABLE2 using the start date and end date. And issue a JOIN it to TABLE1 using a view or temp table. By this we use the index (with id as leading edge) optimally

Thanks!

answered Dec 20 '13 at 23:32

Maheswaran Ravisankar

17,652
6
47
69

Concerning `PARALLEL` - this should be used with the great care. And it should be used when there are no other options that can make query faster. [Just a quote from Tom Kyte's site](http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:39946845137685): `Parallel query is suitable for a certain class of large problems: very large problems that have no other solution. Parallel query is my last path of action for solving a performance problem; it's never my first course of action.` – Yaroslav Shabalin Dec 22 '13 at 12:45

Speeding up aggregations for a large table in Oracle

2 Answers2