oracle SQL: improve connect by clause

Question

I have a table: Table_1 looks like the following:

id   |   column1
-----------------
10   |   abc, kdm
20   |   xyz, lop, nkk

What I want is to convert the table looks like the following:

id   |   column1
-----------------
10   |   abc
10   |   kdm
20   |   xyz 
20   |   lop
20   |   nkk

To do this, I used a query like this:

select id, regexp_substr(column1,'[^,]+', 1, level) from Table_1 
connect by regexp_substr(column1, '[^,]+', 1, level) is not null;

This query works fine as long as the number of comma separated values are less. But when it grows, it consumes more and more time to process.

One solution came to my mind was to create a separate table and then insert values by iterating through the values of Table_1.

Pseudo code as follows:

FOR r in each row
    FOR i in 1..length(comma_separated_values)
       insert into new_table values(id, select regexp_substr(column1,'[^,]+', 1, i) from Table_1 
    End LOOP;
End LOOP;

But since this also consumes much time as the comma separated values grows, Is there any other optimal way to do this(preferred not to use another table, but a temporary/virtual table is OK)?

I'm using Oracle SQL.

Thanks in advance.

can you fix the normalization problem once and for all - and abandon the original TERRIBLE model? if so, then spend the one time cost to do the conversion.. it will be worth it. — Randy, Aug 31 '15 at 19:02

score 1 · Answer 1 · answered Aug 31 '15 at 19:31

Obviously Randy's suggestion to fix the root problem would be ideal. If that is not possible, than a variety of options are available. Many are listed here . In general though a simple solution that may improve your performance would be to find a max number of values in column1, create a temp table with that many columns, then convert that temp table into your desired format. I.e., have an intermediary step where the table looks like id|val1|val2|val3|..|valn

score 1 · Accepted Answer · edited May 23 '17 at 11:43

Normalization issues not withstanding, your original query actually does not work. When run on your test data:

SQL> with Table_1(id, column1) as (
  2    select 10, 'abc, kdm' from dual
  3    union
  4    select 20, 'xyz, lop, nkk' from dual
  5  )
  6  select id, regexp_substr(column1,'[^,]+', 1, level) from Table_1
  7  connect by regexp_substr(column1, '[^,]+', 1, level) is not null;

        ID REGEXP_SUBSTR
---------- -------------
        10 abc
        10  kdm
        20  nkk
        20  lop
        20  nkk
        20 xyz
        10  kdm
        20  nkk
        20  lop
        20  nkk

10 rows selected.

SQL>

So, when you add more values, the problem grows exponentially, thus your performance degrades. Test by adding a third row. Also, your delimiter is comma-space, not just a comma. And the regex form used to parse the list ('[^,]+') does not work with nulls and should be avoided. Unfortunately, it's the most common regex you'll see given as an answer for parsing a list.

Try this instead, it should handle the larger lists with ease:

SQL> with Table_1(id, column1) as (
     select 10, 'abc, kdm' from dual
     union
     select 20, 'xyz, lop, nkk' from dual
   )
   SELECT id, -- column1,
              --  COLUMN_VALUE AS match_nbr,
          REGEXP_SUBSTR( column1 ,'(.*?)(, |$)', 1, COLUMN_VALUE, NULL, 1 ) AS match_value
   FROM
     Table_1,
     TABLE(
       CAST(
         MULTISET(
           SELECT LEVEL
           FROM   DUAL
           CONNECT BY LEVEL <= REGEXP_COUNT(column1 , ',' )+1
         ) AS SYS.ODCINUMBERLIST
       )
     );

        ID MATCH_VALUE
---------- -------------
        10 abc
        10 kdm
        20 xyz
        20 lop
        20 nkk

SQL>

Basically this makes a table, one row per line, of the number of elements in the list (containing the element number) and joins it with the main table.
Uncomment the COLUMN_VALUE value line to see the numbering of the elements. It assumes the data will not contain a comma.

Normalization is not going to be a good solution at this moment as the system is bit old and the usages of this table is pretty high, so the preferred way is to do minimum changes in the system and optimize the query. But as you've suggested, this query works pretty well. Thank you very much for the help. — KTB, Sep 01 '15 at 05:38

score 0 · Answer 3 · answered Aug 31 '15 at 19:20

0

You can try DBMS_UTILITY.COMMA_TO_TABLE procedures for this, I guess it should be more faster that user defined solutions.

Link for documentation : http://docs.oracle.com/cd/B19306_01/appdev.102/b14258/d_util.htm

answered Aug 31 '15 at 19:20

ivanzg

419
4
13

oracle SQL: improve connect by clause

3 Answers3