0

I have over 10,000 records in class_period table. When I run the query shown below, it takes too much time to fetch the data.

Can you please help me - how can I speed up the query?

 WITH DATA AS
     ( SELECT distinct class_time , class_id   
       from class_period       
       
      )
   SELECT distinct class_id, trim(regexp_substr(class_time, '[^:]+', 1, LEVEL)) class_time
    FROM DATA
    CONNECT BY regexp_substr(class_time , '[^:]+', 1, LEVEL) IS NOT NULL

sample data attached as image enter image description here

required data attached as image enter image description here

i am using oracle 11g.

  • Please edit your question and include table layouts and sample data for the `CLASS_PERIOD` table, as well as the results of an `EXPLAIN PLAN` for the query. Thanks. – Bob Jarvis - Слава Україні Oct 25 '20 at 20:48
  • @BobJarvis-ReinstateMonica , thanks for your reply i have attached sample data as image and required data as image – – abdul raheem dev Oct 25 '20 at 21:06
  • Consider why it matters that the result of this query is fast. Do you really need the values for all 10,000 rows or do you only need to know about some? General advice would be to store the data in a more performance friendly way and only have one value for class_time per row - you might need an additional table for this. – Andrew Sayer Oct 25 '20 at 21:10
  • Please read [this post on asking Oracle tuning questions](https://stackoverflow.com/a/34975420/146325). – APC Oct 25 '20 at 21:14
  • @abdulraheemdev - thanks for your changes. In future, please keep in mind that people prefer code and data as **text**, not as images. Text can be copied, pasted, and edited easily, which makes it more likely that you'll get an answer as many people will not re-key code or data from images. – Bob Jarvis - Слава Україні Oct 26 '20 at 01:52

2 Answers2

1
  1. Fix your query so you don't need to use DISTINCT. The problem with your method is that you are using a hierarchical query with multiple rows of input and no way of correlating each level of the hierarchy to the previous level so the query will correlate it to ALL the items at the previous level of the hierarchy and you will get exponentially more and more duplicate rows generated at each depth. This is incredibly inefficient.
  2. Change from using regular expressions to simple string functions.

Instead you can use:

WITH bounds ( class_id, class_time, start_pos, end_pos ) AS (
  SELECT class_id,
         class_time,
         1,
         INSTR( class_time, ':', 1 )
  FROM   data
UNION ALL
  SELECT class_id,
         class_time,
         end_pos + 1,
         INSTR( class_time, ':', end_pos + 1 )
  FROM   bounds
  WHERE  end_pos > 0
)
SELECT class_id,
       CASE end_pos
       WHEN 0
       THEN SUBSTR( class_time, start_pos )
       ELSE SUBSTR( class_time, start_pos, end_pos - start_pos )
       END AS class_time
FROM   bounds;

Which, for the sample data:

CREATE TABLE data ( class_id, class_time ) AS
SELECT 1, '0800AM:0830AM' FROM DUAL UNION ALL
SELECT 1, '0900AM' FROM DUAL UNION ALL
SELECT 2, '0830AM:0900AM:0930AM' FROM DUAL UNION ALL
SELECT 2, '1000AM' FROM DUAL;

Outputs:

CLASS_ID | CLASS_TIME
-------: | :---------
       1 | 0800AM    
       1 | 0900AM    
       2 | 0830AM    
       2 | 1000AM    
       1 | 0830AM    
       2 | 0900AM    
       2 | 0930AM    

db<>fiddle here

However, an even better method would be to change your model for storing the data and stop storing it as a delimited string and instead store it in a separate table or, maybe, as a collection in a nested table.

An example using a second table is:

CREATE TABLE data (
  class_id   NUMBER PRIMARY KEY
);

CREATE TABLE class_times (
  class_id   NUMBER REFERENCES data ( class_id ),
  class_time VARCHAR2(6)
);

INSERT ALL
  INTO data ( class_id ) VALUES ( 1 )
  INTO data ( class_id ) VALUES ( 2 )
  INTO class_times ( class_id, class_time ) VALUES ( 1, '0800AM' )
  INTO class_times ( class_id, class_time ) VALUES ( 1, '0830AM' )
  INTO class_times ( class_id, class_time ) VALUES ( 1, '0900AM' )
  INTO class_times ( class_id, class_time ) VALUES ( 2, '0830AM' )
  INTO class_times ( class_id, class_time ) VALUES ( 2, '0900AM' )
  INTO class_times ( class_id, class_time ) VALUES ( 2, '0930AM' )
  INTO class_times ( class_id, class_time ) VALUES ( 2, '1000AM' )
SELECT * FROM DUAL;

Then your query would be (assuming that you need other columns from data alongside the class_id):

SELECT d.class_id,
       c.class_time
FROM   data d
       INNER JOIN class_times c
       ON ( d.class_id = c.class_id );

Which outputs:

CLASS_ID | CLASS_TIME
-------: | :---------
       1 | 0800AM    
       1 | 0830AM    
       1 | 0900AM    
       2 | 0830AM    
       2 | 0900AM    
       2 | 0930AM    
       2 | 1000AM    

An example using a nested table is:

CREATE TYPE stringlist IS TABLE OF VARCHAR2(6);

CREATE TABLE data (
  class_id   NUMBER,
  class_time stringlist
) NESTED TABLE class_time STORE AS data__class_time;

INSERT INTO data ( class_id, class_time )
SELECT 1, stringlist( '0800AM','0830AM' ) FROM DUAL UNION ALL
SELECT 1, stringlist( '0900AM' ) FROM DUAL UNION ALL
SELECT 2, stringlist( '0830AM','0900AM','0930AM' ) FROM DUAL UNION ALL
SELECT 2, stringlist( '1000AM' ) FROM DUAL;

Then your query would become:

SELECT d.class_id,
       ct.COLUMN_VALUE AS class_time
FROM   data d
       CROSS APPLY TABLE ( d.class_time ) ct

Which outputs:

CLASS_ID | CLASS_TIME
-------: | :---------
       1 | 0800AM    
       1 | 0830AM    
       1 | 0900AM    
       2 | 0830AM    
       2 | 0900AM    
       2 | 0930AM    
       2 | 1000AM    

db<>fiddle here

MT0
  • 143,790
  • 11
  • 59
  • 117
0

MT0 spotted the big problem of your connect by filter allowing all rows to be read. You don't need to convert it to a recursive CTE, since you're already distincting all the columns you're projecting, that can be treated as your primary key (assuming it's not nullable or you don't want the null values).

You also need a special filter so that it doesn't get confused into thinking you've got an infinite loop.

 WITH DATA AS
     ( SELECT distinct class_time , class_id   
       from class_period       
      )
   SELECT distinct class_id, trim(regexp_substr(class_time, '[^:]+', 1, LEVEL)) class_time
    FROM DATA
    CONNECT BY regexp_substr(class_time , '[^:]+', 1, LEVEL) IS NOT NULL
and prior class_time = class_time
and prior class_id = class_id
and prior sys_guid() is not null

The prior sys_guid() is not null is the special filter to prevent it from erroring with ORA-01436: CONNECT BY loop in user data.

This should perform similarly to the recursive CTE.

Andrew Sayer
  • 2,296
  • 1
  • 7
  • 9
  • Better would be to use `CROSS APPLY` [db<>fiddle](https://dbfiddle.uk/?rdbms=oracle_18&fiddle=7bad05d56866cb216ad2771d1d91f112). I know why your method works but I have issues that `SYS_GUID()` is never `NULL` so your last `CONNECT BY` filter will always be true and should not be necessary (or should be able to be replaced with `1=1`) but it is necessary for the proper working of the query as it prevents the cross correlation of rows. – MT0 Oct 25 '20 at 22:12
  • Also, there is a discussion of performance in [this question](https://stackoverflow.com/a/38372145/1509264). – MT0 Oct 25 '20 at 22:15
  • Noone likes to use `sys_guid` but it does the trick (and trick is exactly the right word). You're right, I didn't realize how badly the connect by performs here (although the regexp doesn't particularly help), the batching that recursive CTEs are able to do is just so efficient when you have huge numbers of rows to start from. – Andrew Sayer Oct 26 '20 at 18:26