0

I have a string, clob value in table which i need to split into columns . Source table query:

Insert into disp_data(id,data) values(100,
'"Project title as per the outstanding Requirements","The values are not with respect to the requirement and analysis done by the team. 
Also it is difficult to prepare a scenario notwithstanding the fact it is difficult. This user story is going to be slightly complex however it is up to the team","Active","Disabled","25 tonnes of fuel","www.examplesites.com/html.asp&net;","","","","","25"');

In the clob column value there are spaces, null value and line gaps also. So when i try splitting it using

select regexp_substr(data,'[^,]+',1,level) from disp_data 
connect by regexp_substr(data,'[^,]+',1,level) is not null.

Problem is for the large text with line gaps, it is splitting it into different rows. I had thought of using the above result set and pivot but am unable to.

I need to get this data as columns and push in the destination table-push_data_temp.

select pid,col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11 from push_data_temp;

The clob column has 11 comma separated values that need to be pushed into this table as columns. This entire process needs to be done via pl/sql procedure.

The result in push_data_temp should look like this. enter image description here

ANy help would be much appreciated. DB is oracle 19c

Velocity
  • 433
  • 5
  • 32

2 Answers2

4

Your regular expression needs to allow for nulls, i.e. consecutive commas (but hopefully you don't have commas within any of the quoted strings...). If you have multiple source rows then it's easier to split with a recursive CTE:

with rcte (id, data, lvl, result) as (
  select id, data, 1, regexp_substr(data, '(.*?)(,|$)', 1, 1, null, 1)
  from disp_data
  union all
  select id, data, lvl + 1, regexp_substr(data, '(.*?)(,|$)', 1, lvl + 1, null, 1)
  from rcte
  where lvl <= regexp_count(data, ',')
)
select id, lvl, result
from rcte
order by id, lvl;

You can then pivot the result into the columns you want:

with rcte (id, data, lvl, result) as (
  select id, data, 1, regexp_substr(data, '(.*?)(,|$)', 1, 1, null, 1)
  from disp_data
  union all
  select id, data, lvl + 1, regexp_substr(data, '(.*?)(,|$)', 1, lvl + 1, null, 1)
  from rcte
  where lvl <= regexp_count(data, ',')
)
select *
from (
  select id, lvl, result
  from rcte
)
pivot (max(result) as col for (lvl) in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11));

db<>fiddle

And you can use that directly in an insert statement:

insert into push_data_temp (pid,col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11)
with rcte (id, data, lvl, result) as (
  select id, data, 1, regexp_substr(data, '(.*?)(,|$)', 1, 1, null, 1)
  from disp_data
  union all
  select id, data, lvl + 1, regexp_substr(data, '(.*?)(,|$)', 1, lvl + 1, null, 1)
  from rcte
  where lvl <= regexp_count(data, ',')
)
select *
from (
  select id, lvl, result
  from rcte
)
pivot (max(result) as col for (lvl) in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11));

db<>fiddle

No PL/SQL needed, but you can still wrap it in a procedure if you want to.


I have to take as clob and it is throwing error as inconsistent datatype

You need to cast the tokens as varchar2, which limits their length (either 4k or 32k depending on Oracle version and settings):

with rcte (id, data, lvl, result) as (
  select id, data, 1,
    cast(regexp_substr(data, '(.*?)(,|$)', 1, 1, null, 1) as varchar2(4000))
  from disp_data
  union all
  select id, data, lvl + 1,
    cast(regexp_substr(data, '(.*?)(,|$)', 1, lvl + 1, null, 1) as varchar2(4000))
  from rcte
  where lvl <= regexp_count(data, ',')
)
...

db<>fiddle with CLOB (and connect-by examples removed, as they break it...)


when i try for text with commas in between, it splits data unevenly.

That's why I said "hopefully you don't have commas within any of the quoted strings". As you don't have any really empty elements - you have ...","","... rather than ...,,... - you can skip the concern about those I suppose, and use a different pattern:

with rcte (id, data, lvl, result) as (
  select id, data, 1,
    cast(regexp_substr(data, '("[^"]*"|[^,]+)', 1, 1, null, 1) as varchar2(4000))
  from disp_data
  union all
  select id, data, lvl + 1,
    cast(regexp_substr(data, '("[^"]*"|[^,]+)', 1, lvl + 1, null, 1) as varchar2(4000))
  from rcte
  where lvl <= regexp_count(data, '("[^"]*"|[^,]+)')
)
...

db<>fiddle

If you did have to deal with null elements then it's still possible, but more work. This also won't deal with escaped double-quotes without strings. At some point it will be easier to write your own parser in PL/SQL; or even to write the data to disk and read it back in as an external table which can handle all of this for you.

Alex Poole
  • 183,384
  • 11
  • 179
  • 318
  • 1
    @Vini - I used varchar2 because db<>fiddle broke with CLOB... you just need to cast the elements first. Added that to my answer. – Alex Poole Aug 12 '20 at 13:50
  • Hi, Thanks.. there is one small doubt if you have time for it.. I need to do this for larger text. However when i try for text with commas in between, it splits data unevenly. I have edited the data in sql fiddle. Can you take a look? For col2 tet, i add 2 commas in the text. The result you see on right, is splitting data incorrectly probably due to commas. can this be handled? The text i have changed for second column is, added column after :. The values are and Also it is difficult to. I need to amend query so that the commas are included inside text – Velocity Aug 12 '20 at 14:02
  • https://dbfiddle.uk/?rdbms=oracle_18&fiddle=140f1978cc1fff9d5e3b14d3a107d7eb – Velocity Aug 12 '20 at 14:03
  • 1
    @Vini - that's why I said "hopefully you don't have commas within any of the quoted strings", as you didn't in the sample in the question. Updated with a version that works with the new example. – Alex Poole Aug 12 '20 at 14:49
  • I know, they added to requirement at last moment. Nevertheless thanks a lot!! – Velocity Aug 12 '20 at 15:25
3

Enter Polymorphic Table Functions!

You can use these to dynamically convert comma-separated strings into a list of columns:

create table disp_data (
  id int, data varchar2(1000)
);
Insert into disp_data(id,data) values(100,
'"Project title as per the outstanding Requirements","The values are not with respect to the requirement and analysis done by the team. 
Also it is difficult to prepare a scenario notwithstanding the fact it is difficult. This user story is going to be slightly complex however it is up to the team","Active","Disabled","25 tonnes of fuel","www.examplesites.com/html.asp&net;","","","","","25"');
commit;

create or replace package csv_pkg as  
  /* The describe function defines the new columns */  
  function describe (  
    tab in out dbms_tf.table_t,  
    col_names varchar2  
  ) return dbms_tf.describe_t;  
  
  /* Fetch_rows sets the values for the new columns */  
  procedure fetch_rows (col_names varchar2);  
end csv_pkg;  
/

create or replace package body csv_pkg as  
  function describe(  
    tab in out dbms_tf.table_t,  
    col_names varchar2  
  )   
    return dbms_tf.describe_t as  
    new_cols dbms_tf.columns_new_t;  
    col_id   pls_integer := 2;  
  begin   
    
    /* Enable the source colun for reading */  
    tab.column(1).pass_through := FALSE;  
    tab.column(1).for_read     := TRUE;  
    new_cols(1) := tab.column(1).description;  
      
    /* Extract the column names from the header string,  
       creating a new column for each   
     */  
    for j in 1 .. ( length(col_names) - length(replace(col_names,',')) ) + 1 loop   
      new_cols(col_id) := dbms_tf.column_metadata_t(  
        name=>regexp_substr(col_names, '[^,]+', 1, j),--'c'||j,   
        type=>dbms_tf.type_varchar2  
      );  
      col_id := col_id + 1;  
    end loop;  
    
    return dbms_tf.describe_t( new_columns => new_cols );  
  end;  
  
  procedure fetch_rows (col_names varchar2) as   
    rowset    dbms_tf.row_set_t;  
    row_count pls_integer;  
  begin  
    /* read the input data set */  
    dbms_tf.get_row_set(rowset, row_count => row_count);  
      
    /* Loop through the input rows... */  
    for i in 1 .. row_count loop  
      /* ...and the defined columns, extracting the relevant value   
         start from 2 to skip the input string  
      */  
      for j in 2 .. ( length(col_names) - length(replace(col_names,',')) ) + 2 loop  
        rowset(j).tab_varchar2(i) :=   
          regexp_substr(rowset(1).tab_varchar2(i), '[^,]+', 1, j - 1);  
      end loop;  
    end loop;  
      
    /* Output the new columns and their values */  
    dbms_tf.put_row_set(rowset);  
      
  end;  
    
end csv_pkg; 
/

create or replace function csv_to_columns(  
  tab table, col_names varchar2  
) return table pipelined row polymorphic using csv_pkg; 
/

with rws as (
  select data from disp_data
)
select c1, c2, c4, c4, c5, c6, c11
from   csv_to_columns ( 
  rws, 'c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11'
);

C1                   C2                             C4         C4         C5         C6                   C11       
-------------------- ------------------------------ ---------- ---------- ---------- -------------------- ----------
"Project title as pe "The values are not with respe "Disabled" "Disabled" "25 tonnes "www.examplesites.co "25"      
r the outstanding Re ct to the requirement and anal                        of fuel"  m/html.asp&net;"               
quirements"          ysis done by the team.                                                                         
                     Also it is difficult to prepar                                                                 
                     e a scenario notwithstanding t                                                                 
                     he fact it is difficult. This                                                                  
                     user story is going to be slig                                                                 
                     htly complex however it is up                                                                  
                     to the team"  
Chris Saxon
  • 9,105
  • 1
  • 26
  • 42