My Oracle query takes over 1.5 min and I do not know if it's because of inefficient query writing, bad choice of indexes or some other database issue that I cannot control.
Some tables and data were changed to protect IP.
SELECT /*+ PARALLEL (AUTO) */ COUNT(DISTINCT SUD_USERID)
FROM (
SELECT /*+ PARALLEL (AUTO) */
SUD_USERID ,
CASE WHEN SCH_PAGETYPE = 'Page' AND SUD_EVENTTYPE = 'S'
THEN 'EVENTTYPE1'
WHEN SCH_PAGETYPE = 'Page' AND SUD_EVENTTYPE = 'V'
THEN 'EVENTTYPE2'
WHEN SCH_PAGETYPE = 'Hub' AND SUD_EVENTTYPE = 'S'
THEN 'EVENTTYPE3'
END AS CALC_EVENT_SOURCE,
SUD_EVENT_SOURCE
FROM
(
SELECT /*+ PARALLEL (AUTO) */
UPPER(PAGETYPE)|| '-' || SCH.ID PAGETYPE_ID ,
SCH.PAGETYPE SCH_PAGETYPE
FROM TABLE1 SCH
WHERE SCH.PAGETYPE IN ('Page', 'Hub')
AND SCH.CATEGORY_NAME NOT IN ('archive', 'testcategory')
)
INNER JOIN (
SELECT /*+ PARALLEL (AUTO) */
DISTINCT SUD.TRACEID TRACEID ,
SUD.EVENTTYPE SUD_EVENTTYPE ,
SUD.USERID SUD_USERID,
SUD.EVENT_SOURCE SUD_EVENT_SOURCE
FROM
SOMESCHEMA.USAGE_DETAILS SUD
WHERE
SUD.EVENTTYPE IN ('S', 'V')
)
ON TRACEID = PAGETYPE_ID
INNER JOIN USER_JOB_FAMILY_MAPPING SFD
ON SUD_USERID = SFD.USERID
)
WHERE CALC_EVENT_SOURCE = SUD_EVENT_SOURCE
I could not copy the text of the explain plan (generated via DBeaver) but here is a screenshot:
USAGE_DETAILS table has 3941810 rows
TABLE1 has 5908 rows
USER_JOB_FAMILY_MAPPING has 578233 rows
There are no keys on any of these tables. USAGE_DETAILS.TRACEID is VARCHAR2(500) NOT NULL has function index=SUBSTR("TRACEID",1,4) and another index declared as default but on that column.
USAGE_DETAILS.USERID is VARCHAR2(50) NOT NULL
USAGE_DETAILS.EVENTTYPE is VARCHAR2(10) NOT NULL and has default index
USAGE_DETAILS.EVENT_SOURCE is VARCHAR2(200) NOT NULL and has default index
I have tried doing inner joins on the full tables rather than the parenthetically generated (subselect?) tables, but that did not perform better and also limited my ability to use an alias in the WHERE clause.
I do not know what kind of machine this is running on, just that it's set up for development. I'd like this query to give me accurate answers in under 10s. Some times the query above though still does not return even after 10+ minutes.
Plan hash value: 2784166315
-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 27 | | 258K (1)| 00:00:11 |
| 1 | SORT AGGREGATE | | 1 | 27 | | | |
| 2 | VIEW | VM_NWVW_1 | 1809K| 46M| | 258K (1)| 00:00:11 |
| 3 | HASH GROUP BY | | 1809K| 745M| | 258K (1)| 00:00:11 |
|* 4 | HASH JOIN | | 1809K| 745M| | 258K (1)| 00:00:11 |
|* 5 | TABLE ACCESS FULL | TABLE1S | 5875 | 172K| | 309 (0)| 00:00:01 |
| 6 | MERGE JOIN SEMI | | 3079K| 1180M| | 257K (1)| 00:00:11 |
| 7 | SORT JOIN | | 3079K| 1139M| | 254K (1)| 00:00:10 |
| 8 | VIEW | | 3079K| 1139M| | 254K (1)| 00:00:10 |
| 9 | HASH UNIQUE | | 3079K| 1139M| 1202M| 254K (1)| 00:00:10 |
| 10 | INLIST ITERATOR | | | | | | |
| 11 | TABLE ACCESS BY INDEX ROWID BATCHED| USAGE_DETAILS | 3079K| 1139M| | 46 (0)| 00:00:01 |
|* 12 | INDEX RANGE SCAN | IDX_UUD_EVENTTYPE | 13704 | | | 46 (0)| 00:00:01 |
|* 13 | SORT UNIQUE | | 578K| 7905K| 22M| 3558 (1)| 00:00:01 |
| 14 | INDEX FAST FULL SCAN | USERID_IDX | 578K| 7905K| | 704 (1)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("TRACEID"=UPPER("PAGETYPE")||'-'||TO_CHAR("SCH"."ID"))
filter("from$_subquery$_004"."SUD_EVENT_SOURCE"=CASE WHEN (("SCH"."PAGETYPE"='Page') AND
("from$_subquery$_004"."SUD_EVENTTYPE"='S')) THEN 'EVENTTYPE1' WHEN (("SCH"."PAGETYPE"='Page') AND
("from$_subquery$_004"."SUD_EVENTTYPE"='V')) THEN 'EVENTTYPE2' WHEN (("SCH"."PAGETYPE"='Hub') AND
("from$_subquery$_004"."SUD_EVENTTYPE"='S')) THEN 'EVENTTYPE3' END )
5 - filter("SCH"."CATEGORY_NAME"<>'archive' AND "SCH"."CATEGORY_NAME"<>'testcategory' AND ("SCH"."PAGETYPE"='Hub' OR
"SCH"."PAGETYPE"='Page'))
12 - access("SUD"."EVENTTYPE"='S' OR "SUD"."EVENTTYPE"='V')
13 - access("from$_subquery$_004"."SUD_USERID"="SFD"."USERID")
filter("from$_subquery$_004"."SUD_USERID"="SFD"."USERID")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation
After running exec dbms_stats.gather_table_stats(ownname=>'SCHEMA1',tabname=>'USAGE_DETAILS'); exec dbms_stats.gather_table_stats(ownname=>'SCHEMA1',tabname=>'TABLE1');
I have this new plan:
SQL> select plan_table_output from table(dbms_xplan.display());
Plan hash value: 3419946982
----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 27 | | 70152 (1)| 00:00:03 |
| 1 | SORT AGGREGATE | | 1 | 27 | | | |
| 2 | VIEW | VM_NWVW_1 | 53144 | 1401K| | 70152 (1)| 00:00:03 |
| 3 | HASH GROUP BY | | 53144 | 21M| 21M| 70152 (1)| 00:00:03 |
|* 4 | HASH JOIN RIGHT SEMI | | 53144 | 21M| 14M| 65453 (1)| 00:00:03 |
| 5 | INDEX FAST FULL SCAN | USERID_IDX | 578K| 7905K| | 704 (1)| 00:00:01 |
|* 6 | HASH JOIN | | 53144 | 20M| | 62995 (1)| 00:00:03 |
| 7 | JOIN FILTER CREATE | :BF0000 | 5503 | 161K| | 309 (0)| 00:00:01 |
|* 8 | TABLE ACCESS FULL | TABLE1 | 5503 | 161K| | 309 (0)| 00:00:01 |
| 9 | VIEW | | 3549K| 1259M| | 62677 (1)| 00:00:03 |
| 10 | HASH UNIQUE | | 3549K| 159M| 203M| 62677 (1)| 00:00:03 |
| 11 | JOIN FILTER USE | :BF0000 | 3549K| 159M| | 21035 (1)| 00:00:01 |
|* 12 | TABLE ACCESS FULL| USAGE_DETAILS | 3549K| 159M| | 21035 (1)| 00:00:01 |
----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("from$_subquery$_004"."SUD_USERID"="SFD"."USERID")
6 - access("TRACEID"=UPPER("PAGETYPE")||'-'||TO_CHAR("SCH"."ID"))
filter("from$_subquery$_004"."SUD_EVENT_SOURCE"=CASE WHEN (("SCH"."PAGETYPE"='Page') AND
("from$_subquery$_004"."SUD_EVENTTYPE"='S')) THEN 'EVENTTYPE1' WHEN (("SCH"."PAGETYPE"='Page') AND
("from$_subquery$_004"."SUD_EVENTTYPE"='V')) THEN 'EVENTTYPE2' WHEN (("SCH"."PAGETYPE"='Hub') AND
("from$_subquery$_004"."SUD_EVENTTYPE"='S')) THEN 'EVENTTYPE3' END )
8 - filter("SCH"."CATEGORY_NAME"<>'archive' AND "SCH"."CATEGORY_NAME"<>'testcategory' AND
("SCH"."PAGETYPE"='Hub' OR "SCH"."PAGETYPE"='Page'))
12 - filter(("SUD"."EVENTTYPE"='S' OR "SUD"."EVENTTYPE"='V') AND
SYS_OP_BLOOM_FILTER(:BF0000,"SUD"."TRACEID"))
Note
-----
- automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation
37 rows selected.
Is it more likely that the compute statistics massively helped this query or that someone did something else that I was not aware of? Yes, the query ran much better, but I'd feel better too if I knew why.