0

I'm trying to improve a query and I only have a vague understanding of the explain plan. This query has very large joins and many tables. One of the improvements I tried out was breaking all the join down into smaller sub-queries and groups so that it could take advantage of joining smaller datasets as much as possible (still not complete it's a very large query). The basics structure is

select a.attr1, a..attr2, b.attr1, b.attr2, c.attr1, c.attr2, c.attr3 ...
from (select a1.attr1, a2.attr2 from tablea1 a1 join tablea2 a2 on  a1.attr1 = a2.attr1) a
left outer join (select b1.attr1, b2.attr2 from tableb1 b1 join tableb2 b2 on  b1.attr1 = b2.attr1)
left outer join (select c1.attr1, c2.attr2, c3.attr3 from tablec1 c1 join tablec2 c2 on  c1.attr1 = c2.attr1 join tablec3 c3 on  c1.attr1 = c3.attr1) c
....

When I try an explain plan I notice that there certain steps like joining the results of a to c that go from serial to parallel execution.
I understand from this http://www.oracle.com/technetwork/articles/database-performance/geist-parallel-execution-1-1872400.html that going from parallel to serial is not desired. But I don't see much documentation from google on how to get it process each step in parallel (in this case tables in the C group).
1. Is going from Serial to Parallel desired or bad?
2. Is there some hints I can give the runner to trigger parallel execution if I can guarantee that all join statements occur on partitioned/indexed columns?

Niru
  • 1,407
  • 1
  • 28
  • 47

1 Answers1

2
  1. Going from serial to parallel is probably a wasted opportunity. There are places where running in serial is better than running in parallel. But if part of a query is already running in parallel it usually makes sense to run the entire query in parallel. If any parts of the query are running in parallel the query is already incurring the various "costs" of parallelism - complexity, added resources, etc. It probably won't hurt to run it all in parallel. And when you consider Amdahl's law, it's important to worry about any serial code.
  2. Statement-level hints make it trivial to parallelize the whole thing. Making the entire statement run in parallel may be trivial. Use a statement-level parallel hint instead of an object-level hint. Write a query like select /*+ parallel(8) */ ... instead of select /*+ parallel(a) parallel_index(b) */ .... Statement-level hints are easier to write, less prone to being "ignored" when the aliases change, and apply to index scan as well as table scans.

On a side-node, breaking queries into multiple inline views may not make any performance difference. Oracle has optimizations like predicate pushing and view merging that will undo minor syntactic changes. However, breaking large queries into chunks often makes a huge difference in the ability to logically understand the query. A large number of small queries is factorially easier to understand than a small number of large queries. (For example, comparing a 6-join query with two 3-join queries - 6! > 3! + 3!.) Keep breaking up queries into small, understandable pieces; but don't expect it to improve performance.

Jon Heller
  • 34,999
  • 6
  • 74
  • 132
  • Thanks for the answer. I will try out the parallel(8) hints. I thought statement level hints for parallel didn't matter if I set a session level ALTER SESSION FORCE PARALLEL QUERY PARALLEL 8; ? – Niru Mar 08 '16 at 19:58
  • @Niru That *should* work. You might be running into one of the [many factors that determine or limit parallelism](http://stackoverflow.com/a/21132027/409172). Can you post the full query and execution plan? – Jon Heller Mar 08 '16 at 23:53
  • I cannot post the explain plan unfortunately, it's work hence the overly broad question. I learned that if using alter session parallel and the explain plain still forces serial execution of steps then the query joins are not written in a way for the optimizer to utilize parallelization. In my case I moved the serial steps into with clauses, so that when joining it does not need to incur the cost of parallel to serial and vice versa. For the generic code, tables c1,c2,c3 were themselves select statements, which is what I moved to with clauses. Thanks for the explanations, they really helped! – Niru Mar 09 '16 at 21:50