ORACLE CONNECT BY LEVEL Producing Duplicate rows and creating performance issue

Question

I am running into performance issue when I use ORACLE CONNECT BY LEVEL due to duplicate records. My source table has around 50k records, in which one column (code in the example) can have multiple values separated by semicolon. Finally I should get an out with create one record per each unique value (code).

Source data:

CREATE TABLE TEST
(name varchar2(255),
code varchar2(255)
);

insert into test values ('ABC-Amherst','1179;348;403;407;408;412;413;426;428;430');
insert into test values ('ABC-Amherst','1179;352;401;403;412;413;416;420;433;A40');
insert into test values ('ABC-Ville','1179;352;401;403;412;413;416;420;433;A40');
commit;

Expected Output:

Name        Code
ABC-Amherst 348
ABC-Amherst 408
ABC-Amherst 430
ABC-Amherst 428
ABC-Amherst 426
ABC-Amherst 433
ABC-Amherst A40
ABC-Amherst 407
ABC-Amherst 352
ABC-Amherst 401
ABC-Amherst 413
ABC-Amherst 403
ABC-Amherst 420
ABC-Amherst 416
ABC-Amherst 1179
ABC-Amherst 412
ABC-Ville   412
ABC-Ville   352
ABC-Ville   A40
ABC-Ville   420
ABC-Ville   416
ABC-Ville   401
ABC-Ville   1179
ABC-Ville   433
ABC-Ville   403
ABC-Ville   413

Query used - Below query is returning duplicates. When I run it on my full dataset it runs forever and finally freezes. I tried adding distinct/row number but they are also running forever. Is there a way to prevent the duplicates or any other efficient ways to get the expected output?

 SELECT name,
   trim(regexp_substr(code,'(;|^)([^;]+)',1,level,NULL,2) ) AS code
 FROM test
 CONNECT BY prior name=name
 AND level             <= 1 + regexp_count(code,';')
 AND prior sys_guid()  IS NOT NULL;

The linked duplicate has multiple solutions and a discussion of their relative performance. (Just use `;` instead of `,`) — MT0, Nov 09 '20 at 19:10
For example [db<>fiddle](https://dbfiddle.uk/?rdbms=oracle_18&fiddle=aad18992204fe37ae11483743763d332). — MT0, Nov 09 '20 at 19:19

GMB · Answer 1 · 2020-11-09T21:37:11.270

0

I wonder whether a json approach would offer better performance:

select distinct t.name, x.code
from test t
cross apply json_table(
    '["' || replace(t.code, ';', '","') || '"]',
    '$[*]' columns (code varchar(50) path '$')
) x
order by t.name, x.code

Depending on what you call a duplicate, it might be acceptable to deduplicate in the lateral join rather than in the outer query (I am unsure that's what you want, but still showing how you would proceed):

select t.name, x.code
from test t
cross apply (
    select distinct code
    from json_table(
        '["' || replace(t.code, ';', '","') || '"]',
        '$[*]' columns (code varchar(50) path '$')
    ) x
) x
order by t.name, x.code

Demo on DB Fiddle

edited Nov 09 '20 at 21:37

answered Nov 09 '20 at 18:49

GMB

216,147
25
84
135

Thank you. First query result is what I am expecting. I never used JSON functions. When I ran your query I got the this error. My oracle version is 12c. ORA-40441: JSON syntax error 40441. 00000 - "JSON syntax error" *Cause: The provided JavaScript Object Notation (JSON) data had invalid syntax and could not be parsed. *Action: Provide JSON data with the correct syntax. – Sai Pasnuri Nov 09 '20 at 19:06
@SaiPasnuri: this might indicate that you have badly formatted CSV strings. I edited my answer for the query to accept strings in the CSV elements rather than numbers only. Does it work better? – GMB Nov 09 '20 at 21:38

ORACLE CONNECT BY LEVEL Producing Duplicate rows and creating performance issue

1 Answers1