Without seeing the real data it is hard to say anything for certain. xQbert has some interesting comments; I would also recommend benchmarking and testing your real potential queries against your real data and compare your results.
The optimizer is smart, but depending on the nature and volume of data in TABLE_A
, TABLE_B
, TABLE_C
, you may get different plans.
The answer may be different, depending on the data and the query. The clearest way to know is to test.
Here's an example:
First, create the test tables and load them. For this example, I'll arbitrarily just throw 20K rows in each, with a high uniformity of data across the three.
CREATE TABLE TABLE_A(ID NUMBER GENERATED ALWAYS AS IDENTITY NOT NULL PRIMARY KEY, YEAR NUMBER);
CREATE TABLE TABLE_B(ID NUMBER GENERATED ALWAYS AS IDENTITY NOT NULL PRIMARY KEY, YEAR NUMBER, CONDITION VARCHAR2(20));
CREATE TABLE TABLE_C(ID NUMBER GENERATED ALWAYS AS IDENTITY NOT NULL PRIMARY KEY, YEAR NUMBER, STATUS VARCHAR2(20));
INSERT INTO TABLE_A(YEAR) SELECT LEVEL FROM DUAL CONNECT BY LEVEL <= 20000;
INSERT INTO TABLE_B(YEAR,CONDITION)
SELECT LEVEL, DECODE(MOD(LEVEL,2),0,'TRUE','FALSE') FROM DUAL CONNECT BY LEVEL <= 20000;
INSERT INTO TABLE_C(YEAR,STATUS)
SELECT LEVEL, DECODE(MOD(LEVEL,3),0,'X',1,'Y','Z') FROM DUAL CONNECT BY LEVEL <= 20000;
... gather stats
Then, compare your queries. First with scalars:
SELECT
A.ID
, A.YEAR
, CASE WHEN (SELECT B.ID
FROM TABLE_B B
WHERE B.ID = A.ID
AND B.YEAR = A.YEAR
AND B.CONDITION = 'TRUE') = A.ID THEN
CASE WHEN (SELECT C.STATUS
FROM TABLE_C C
WHERE C.ID = A.ID
AND C.YEAR = A.YEAR) = 'X' THEN 'STATUS X'
ELSE 'STATUS Y' END
ELSE CASE WHEN (SELECT C.STATUS
FROM TABLE_C C
WHERE C.ID = A.ID
AND C.YEAR = A.YEAR) = 'Z' THEN 'STATUS Z'
ELSE 'STATUS NOT FOUND' END
END AS STATUS
FROM TABLE_A A
ORDER BY 1 ASC;
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20000 | 156K| 89479 (1)| 00:00:04 |
|* 1 | TABLE ACCESS BY INDEX ROWID | TABLE_B | 1 | 14 | 2 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | SYS_C00409109 | 1 | | 1 (0)| 00:00:01 |
|* 3 | TABLE ACCESS BY INDEX ROWID | TABLE_C | 1 | 10 | 2 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | SYS_C00409111 | 1 | | 1 (0)| 00:00:01 |
|* 5 | TABLE ACCESS BY INDEX ROWID| TABLE_C | 1 | 10 | 2 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | SYS_C00409111 | 1 | | 1 (0)| 00:00:01 |
| 7 | SORT ORDER BY | | 20000 | 156K| 89479 (1)| 00:00:04 |
| 8 | TABLE ACCESS FULL | TABLE_A | 20000 | 156K| 11 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Statistics
-----------------------------------------------------------
4 CPU used by this session
4 CPU used when call started
11 DB time
10592 Requests to/from client
10591 SQL*Net roundtrips to/from client
80006 buffer is not pinned count
766 buffer is pinned count
117828 bytes received via SQL*Net from client
2531165 bytes sent via SQL*Net to client
2 calls to get snapshot scn: kcmgss
5 calls to kcmgcs
41127 consistent gets
...
...
etc.
And compare with a join query that returns equivalent data:
SELECT
TABLE_A.ID,
TABLE_A.YEAR,
CASE WHEN TABLE_B.CONDITION = 'TRUE'
THEN
DECODE(TABLE_C.STATUS, 'X', 'STATUS X', 'STATUS Y')
ELSE DECODE(TABLE_C.STATUS, 'Z', 'STATUS Z', 'STATUS NOT FOUND') END AS STATUS
FROM TABLE_A
LEFT OUTER JOIN TABLE_B
ON TABLE_A.ID = TABLE_B.ID
AND TABLE_A.YEAR = TABLE_B.YEAR
LEFT OUTER JOIN TABLE_C
ON TABLE_A.ID = TABLE_C.ID
AND TABLE_A.YEAR = TABLE_C.YEAR
ORDER BY 1 ASC;
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20000 | 625K| 42 (10)| 00:00:01 |
| 1 | SORT ORDER BY | | 20000 | 625K| 42 (10)| 00:00:01 |
|* 2 | HASH JOIN RIGHT OUTER| | 20000 | 625K| 40 (5)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TABLE_C | 20000 | 195K| 13 (0)| 00:00:01 |
|* 4 | HASH JOIN OUTER | | 20000 | 429K| 26 (4)| 00:00:01 |
| 5 | TABLE ACCESS FULL | TABLE_A | 20000 | 156K| 11 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | TABLE_B | 20000 | 273K| 14 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Statistics
-----------------------------------------------------------
1 CPU used by this session
1 CPU used when call started
13 DB time
10592 Requests to/from client
10591 SQL*Net roundtrips to/from client
117622 bytes received via SQL*Net from client
2531166 bytes sent via SQL*Net to client
2 calls to get snapshot scn: kcmgss
11 calls to kcmgcs
160 consistent gets
...
...
etc.
In this example all the tables had similar rows and a 1:1 join relationship.
The queries had different plans, as would be expected.
But what if TABLE_A
had 1B rows and very few joined to TABLE_C
?
Or if TABLE_B
were 100% 'TRUE', or data is highly skewed, etc.?
Both the scalar plan and the join plan will differ against different data sets.
Throwing a temp table in the mix would surely perform differently as well (I'd be surprised if it won the day in this kind of scenario, but...)
Testing to find the best fit for your data is the surest way, especially given your comment that performance is a concern.