Extracting values from a pipe separated string

Question

I have a string which looks like --

12361_BBMS_GTECHL|12362_BBMS_PRIM|12363_BBMS_SEC|....and so on

So i need to fetch

12361 and BBMS_GTECHL
12362 and BBMS_PRIM
12363 and BBMS_SEC

i used --

select *
  FROM
  TABLE(XMLSEQUENCE(
        EXTRACT(
            XMLTYPE('<rowset><row><Code>'||
                replace(replace('12361=BBMS_GTECHL|12362=BBMS_PRIM','|','</Value></row><row><Code>'),'=','</Code><Value>')||'</Value>'||'</row></rowset>'),'/rowset/row')));


declare
  l_val varchar2(1000);
begin
  select substr('12361_BBMS_GTECHL|12362_BBMS_PRIM', instr('|')+1)  into     l_val from dual;
  dbms_output.put_line(l_val);
end;

But getting problem in getting desired result ! I have need to write this logic in a package that i will do if i got some hint here.

My DB version is --

 Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

Possible duplicate of [this](http://stackoverflow.com/questions/14328621/splitting-string-into-multiple-rows-in-oracle) — Aleksej, May 16 '16 at 14:20
Not really a duplicate - splitting a pipe-separated string is indeed a duplicate, but in this problem there are two tokens per pipe-separated string, separated in turn by the first underscore (where there may be more underscores). — , May 16 '16 at 16:16

score 2 · Answer 1 · answered May 16 '16 at 16:15

Here is a solution using a recursive factored subquery ("recursive CTE"). Note the use of pointers to the location of pipe symbols and the first underscore after each pipe (disregarding the other underscores). Also, the solution uses only standard INSTR and SUBSTR, avoiding the use of regular expressions (which perform somewhat slower - important if you process lots of data).

with input_data (input_str) as (
          select '12361_BBMS_GTECHL|12362_BBMS_PRIM|12363_BBMS_SEC' from dual
     ),
     t (str) as (
        select '|' || input_str || '|' from input_data
     ),
     r (lvl, code, descr, str, p1_from, p2_from, p1_to, p2_to) as (
        select 0, null, null, str, 1, 1, instr(str, '_', 1, 1), instr(str, '|', 1, 2) 
          from t
        union all
        select lvl+1, substr(str, p2_from + 1, p1_to - p2_from - 1), 
                      substr(str, p1_to + 1, p2_to - p1_to - 1),
                      str, p1_to, p2_to, instr(str, '_', p2_to + 1, 1), 
                                         instr(str, '|', p2_to + 1, 1) 
          from r
          where p1_to != 0
     )
select code, descr
from   r
where  lvl != 0;

Output:

CODE    DESCR
------- --------------------
12361   BBMS_GTECHL
12362   BBMS_PRIM
12363   BBMS_SEC

Thanks Mathguy, Your query runs perfectly. I need to use this logic in my package where i process lots of data for migration. I want to know can i achieve this using XML parsing i believe which is faster. — mradul, May 17 '16 at 05:43

score 0 · Answer 2 · answered May 18 '16 at 07:13

If I were you and my primary consideration was performance i would use Table Functions. mathguys solution works perfectly but it will be more performant if we use a pipelined function.

First we create types which are necessary for our function.

drop type type_test_table;

drop type type_test_row;

CREATE TYPE type_test_row AS OBJECT (
  code           varchar2(2000),
  descr  VARCHAR2(50)
)
/

CREATE TYPE type_test_table IS TABLE OF type_test_row
/

Then we create our function :

create or replace function test_pipe_func return type_test_table pipelined as
cursor c_data_in is 
select '12361'||level||'_BBMS_GTECHL'||level||'|12362'||level||'_BBMS_PRIM'||level||'|12363'||level||'_BBMS_SEC'||level||'|12364'||level||'_BBU_SEC'||level as str from dual
connect by level <= 1000000;
v_element varchar2(300);
v_code varchar2(100);
v_descr varchar2(200);
p_deb number;
p_fin number;
begin 
    for l_data_in in c_data_in loop
        p_deb := 0;
        p_fin := 1;
        while p_fin > 0 loop
            p_fin := case when p_deb = 0 then instr(l_data_in.str,'|',1, 1) else instr(l_data_in.str,'|',p_deb-1, 2) end;
            p_deb := case when p_deb = 0 then 1 else instr(l_data_in.str,'|',p_deb-1, 1)+1 end;
            v_element := case when p_fin = 0 then substr(l_data_in.str, p_deb) else substr(l_data_in.str, p_deb, p_fin - p_deb) end;
            p_deb := p_fin +1;
            v_code := substr(v_element, 1 , instr(v_element, '_' , 1,1)-1);
            v_descr := substr(v_element, instr(v_element, '_' , 1,1)+1);
            pipe row(type_test_row(v_code, v_descr));
        end loop;

    end loop;
end test_pipe_func;
/

I changed the test case a little bit to be able to generate as many lines as necessary for my tests. And i used a pipelined function to limit usage of Process memory in case of big datasets and to be able to use it with a select. If your use case is different(i don't know maybe to insert into a table using the input) another option can be to use bulk collect into and forall.

create or replace procedure test_bulk_collect_proc as
cursor c_data_in is 
select '12361'||level||'_BBMS_GTECHL'||level||'|12362'||level||'_BBMS_PRIM'||level||'|12363'||level||'_BBMS_SEC'||level as str from dual
connect by level <= 1000000;
type type_table_data_in is table of c_data_in%rowtype;
table_data_in type_table_data_in;

v_element varchar2(300);
v_code varchar2(100);
v_descr varchar2(200);
p_deb number;
p_fin number;
v_str varchar2(4000);
v_t_insr type_test_table;

limit_in number := 100000;
i number;
begin 
   OPEN c_data_in;
    LOOP
        FETCH c_data_in BULK COLLECT INTO table_data_in LIMIT limit_in;
        v_t_insr := type_test_table();
        i := 1;
        for indx IN 1 .. table_data_in.COUNT LOOP
            v_str := table_data_in(indx).str;
            p_deb := 0;
            p_fin := 1;
            while p_fin > 0 loop
                p_fin := case when p_deb = 0 then instr(v_str,'|',1, 1) else instr(v_str,'|',p_deb-1, 2) end;
                p_deb := case when p_deb = 0 then 1 else instr(v_str,'|',p_deb-1, 1)+1 end;
                v_element := case when p_fin = 0 then substr(v_str, p_deb) else substr(v_str, p_deb, p_fin - p_deb) end;
                p_deb := p_fin +1;
                v_code := substr(v_element, 1 , instr(v_element, '_' , 1,1)-1);
                v_descr := substr(v_element, instr(v_element, '_' , 1,1)+1);
                v_t_insr.extend;
                v_t_insr(i) := type_test_row(v_code, v_descr);
                i:= i+1;
            end loop;
        END LOOP;

        forall t in v_t_insr.first..v_t_insr.last
        insert into test_bbu(CODE, DESCR) values (v_t_insr(t).code, v_t_insr(t).descr);

        EXIT WHEN table_data_in.COUNT < limit_in;

   END LOOP;
End;
/

I tested all three methods on my database. To test the sql of mathguy and the pipelined function i used CTAS and for the bulk collect into i simply executed the procedure.

create table test_bbu as 
with input_data (input_str) as (
              select '12361'||level||'_BBMS_GTECHL'||level||'|12362'||level||'_BBMS_PRIM'||level||'|12363'||level||'_BBMS_SEC'||level from dual
          connect by level <= 1000000
     ),
     t (str) as (
        select '|' || input_str || '|' from input_data
     ),
     r (lvl, code, descr, str, p1_from, p2_from, p1_to, p2_to) as (
        select 0, null, null, str, 1, 1, instr(str, '_', 1, 1), instr(str, '|', 1, 2) 
          from t
        union all
        select lvl+1, substr(str, p2_from + 1, p1_to - p2_from - 1), 
                      substr(str, p1_to + 1, p2_to - p1_to - 1),
                      str, p1_to, p2_to, instr(str, '_', p2_to + 1, 1), 
                                         instr(str, '|', p2_to + 1, 1) 
          from r
          where p1_to != 0
     )
select code, descr
from   r
where  lvl != 0;

create table test_bbu2 as
select * from table(test_pipe_func);

execute test_bulk_collect_proc;

I tested three methods with 500K and 1M lines. Here are my results but i urge you to test on your environnement before you make your decision.

                 500K          1M
----------------------------------------
SQL               36s          1m:15s
Pipelined         11s          23s
Bulk Collect      8s           17s

Extracting values from a pipe separated string

2 Answers2