2

I have an SQL query that reuses a complex subquery in two different places across a union

SELECT A.PROJECT_ID, B.INFO B.MORE_INFO
FROM A
INNER JOIN (
     -- Complex subquery
) B ON A.PROJECT_ID = B.PROJECT_ID
WHERE -- Filter to a disjoint subset --

UNION

SELECT C.PROJECT_ID, B.INFO B.MORE_INFO
FROM C
INNER JOIN (
     --  SAME Complex subquery
) B ON C.PROJECT_ID = B.PROJECT_ID
WHERE -- Filter to a disjoint subset --

In order to reduce repeated code, I'd like to use the WITH clause so I only have to write the complex subquery once.

WITH B AS (
    -- Complex subquery
)

SELECT A.PROJECT_ID, B.INFO B.MORE_INFO
FROM A
INNER JOIN B ON A.PROJECT_ID = B.PROJECT_ID
WHERE -- Filter to a disjoint subset --

UNION

SELECT C.PROJECT_ID, B.INFO B.MORE_INFO
FROM C
INNER JOIN B ON A.PROJECT_ID = B.PROJECT_ID
WHERE -- Filter to a disjoint subset --

This works, but now the query takes 4 times longer to execute. Looking at the explain plan for both of these queries, I think I know what's going on: The WITH clause is creating a temporary table that's not indexed, so when joining A and B, a full table scan is being executed. In the plan for the original query, a pushdown predicate is being used, reducing the work needed for the join. This corroborates with this modern SQL article I found talking about how WITH clauses can adversely impact performance.

Given all of this, is there a way where I can use the WITH clause more like a text macro a la C's preprocessor without creating a temporary table? Maybe there is a plan hint I could use? Or is there possibly some other native feature in Oracle or SQL that allows me to do the same thing?

In the end, I'd like to see Oracle do the same plan as the original query, while also taking advantage of the code modularity the WITH clause offers

Some notes:

  • Oracle 12c
  • I cannot eliminate the UNION
  • The two WHERE filters are different, resulting with two disjoint sets
Charlie
  • 23
  • 5
  • Nice job on the question. Great first post. I wish I had a good answer for this. For a similar problem, I had a PLSQL that would output the statements from files (or could be any other source) and then use EXECUTE IMMEDIATE to get the results. Maybe not ideal, but it can reduce the maintenance risk. – Dennis Feb 11 '21 at 17:49
  • 1
    The simplest thing you can try is the `inline` hint. Alas, for some reason that I personally do not understand, this hint is **not documented** (just like its counterpart, the `materialize` hint), even though they are widely used. Depending on your organization's rules, you may or may not be able to use this solution. I assume you know about hints, but didn't know about this particular one; if you need more help with it please say so. In any case, you may want to try it in a test environment to see what **could** be done. –  Feb 11 '21 at 17:55
  • Separate question though - do you really need `UNION`, or would `UNION ALL` work as well? –  Feb 11 '21 at 17:58
  • Could you post *both* execution plans (see [here](https://stackoverflow.com/a/34975420/4808122) how to get them in a text form - you may anonymize tables etc if in doubt). Important is to know if you are joining with nested loop or hash join. (is it same in both queries?). Also tell us about the size of the tables A,B and C and how many rows are returned by the *filters* (I guess very small number). Without this information all is speculation and claiming *WITH is causing problem* without saying *why* is not very responsible. But the question is good (+1). – Marmite Bomber Feb 11 '21 at 18:49
  • @MarmiteBomber - the speculation can be answered very easily by trying the query with and without the `inline` hint, even if it can't be used in production. Much easier than what you suggested. –  Feb 11 '21 at 19:45
  • @mathguy inline worked!! I'm getting similar performance now. I definitely could use `UNION ALL` though in my case decided to keep it since the optimizer changes it to a UNION ALL anyway. @Dennis thank you! – Charlie Feb 11 '21 at 19:51
  • @MarmiteBomber I've left out the plan because my real query is much more complex than in my post. Yes, the rows returned by filters are small compared to the total rows. Additionally, I've done further testing by writing both queries where the only difference is the WITH clause. That, in addition to mathguy's suggestion has led me to this conclusion. – Charlie Feb 11 '21 at 20:11
  • @mathguy if you like, you can create an answer with the `inline` solution and I'll mark it as the answer. Should I re-add "Oracle" to my title though since this is an Oracle specific solution? – Charlie Feb 11 '21 at 20:18
  • Normally I don't post such short comments as "answers" (I think the whole "reputation" thing is laughable, so I don't care about that aspect). However, this may be a relatively common question, so I made an exception - I just copied the comment and posted it as an answer. Yes, Oracle in the title makes perfect sense; I think the question itself, not just the solution, is Oracle-specific, since it's about how the Oracle optimizer works, not about the theoretical `with` clause (supported in many SQL dialects). –  Feb 11 '21 at 20:26

2 Answers2

2

The simplest thing you can try is the inline hint.

Alas, for some reason that I personally do not understand, this hint is not documented (same as its counterpart, the materialize hint), even though they are widely used. Depending on your organization's rules, you may or may not be able to use this solution. I assume you know about hints, but didn't know about this particular one; if you need more help with it please say so. In any case, you may want to try it in a test environment to see what could be done.

EDIT (since others may read this as well):

The with clause should look like this:

with
  b as ( select /*+ inline */ ...... complex query ....... )
.......
0

This answers the original version of the question.

As you have structured the query, you can use or:

SELECT A.PROJECT_ID, B.INFO B.MORE_INFO
FROM A INNER JOIN
     (
     -- Complex subquery
     ) B
     ON A.PROJECT_ID = B.PROJECT_ID
WHERE -- Filter to a disjoint subset -- OR
      -- Filter to a disjoint subset --

If you need to remove duplicates, you can add SELECT DISTINCT.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I cannot do an OR clause because the second set actually comes from a different source. I need to have a Source column (e.g. "SourceA" as Source), so I need to use the UNION. I left this detail out in the interest of brevity. – Charlie Feb 11 '21 at 17:27
  • 1
    @Charlie - Gordon is right. You **can** use `or`, the only question is "what's the correct way". You can perform the `union` first (even after adding a `source` column to both members, to identify each source), and then the join condition can be conditional: `join... on source = 'A' and (the join condition for 'A') OR source = 'C' and (the join condition for 'C')` –  Feb 11 '21 at 18:10