Store list of elements in multiple rows in plsql from a clob column

Question

I have two tables, TableA and TableB.

Table A has 5 columns:

create table tableA
(
    Job_id varchar2(50),
    Job_type varchar2(50),
    Employee_list clob,
    current_date date,
    salary varchar2(15)
);

Job_ID, Job_Type, Employee_list, Current_Date and Salary

Here employee_list is a clob data type.

I want to create another Table B from Table A which will have 4 columns:

create table tableB
(
    Job_id varchar2(50),
    Employee varchar2(20),
    current_date date,
    salary varchar2(15)
);

Job_ID, Employee, Current_Date and Salary

see I have adding a new column 'Employee' to store individual employee name. The problem is that the query is taking really long to execute.

My approach is as follows:-

INSERT INTO tableB (job_id, employee, current_date,salary)
SELECT job_id,
       REGEXP_SUBSTR(employee_list, '[^;]+', 1, LEVEL) as employee, current_date,salary
FROM (
select job_id,employee_list,current_date,salary from
tableA where 
job_type = 'Managerial' and current_date = sysdate  
)
CONNECT BY job_id = PRIOR job_id AND PRIOR SYS_GUID() IS NOT NULL
  AND LEVEL <= REGEXP_COUNT(employee_list, '[^;]+');

The program kept on running for hours but no solution. Note that employee_list include strings which are greater than 4000 characters.

`current_date = sysdate` is probably not comparing what you think it is comparing. `current_date` is a built-in function and you are calling the function and **not** getting the column value with the same name. If you are on a client where the session time (`current_date`) is the same as the system time (`sysdate`) then you are effectively using `1 = 1` for the filter and it will always be true. However, if a colleague in a different part of the world tries to run your query then the session time will not equal the server time and it will effectively be `0 = 1` and always false. — MT0, Jul 12 '23 at 09:43

MT0 · Answer 1 · 2023-07-12T09:41:23.983

Regular expressions are slower than simple string functions.
Generating a GUID for each row is expensive.
Correlating a hierarchical table like that will generate lots of duplicate rows that will need to be filtered out by the CONNECT BY clause which will become exponentially more expensive as the number of rows increases.

Instead, you can use simple string functions and a recursive query:

INSERT INTO tableb (job_id, employee, current_date, salary)
WITH bounds (job_id, employee_list, current_date, salary, spos, epos) AS (
  SELECT job_id,
         employee_list,
         a.current_date,
         salary,
         1,
         INSTR(employee_list, ',', 1)
  FROM   tablea a
  -- WHERE  job_type = 'Managerial'
  -- AND    a.current_date = sysdate
UNION ALL
  SELECT job_id,
         employee_list,
         b.current_date,
         salary,
         epos + 1,
         INSTR(employee_list, ',', epos + 1)
  FROM   bounds b
  WHERE  epos > 0
)
SEARCH DEPTH FIRST BY job_id SET order_id
SELECT job_id,
       TRIM(
         CASE epos
         WHEN 0
         THEN SUBSTR(employee_list, spos)
         ELSE SUBSTR(employee_list, spos, epos - spos)
         END
       ),
       b.current_date,
       salary
FROM   bounds b;

Note: current_date is a built-in function and while you can call a column current_date it is probably not advisable to do so as you will need to always prefix the column name with the table name or alias otherwise the function value will be returned in preference to the column value. Do yourself a favour and simplify things by calling the column something other than the name of a built-in function.

Which, for the sample data:

create table tableA
(
    Job_id varchar2(50),
    Job_type varchar2(50),
    Employee_list clob,
    current_date date,
    salary varchar2(15)
);

create table tableB
(
    Job_id varchar2(50),
    Employee varchar2(20),
    current_date date,
    salary varchar2(15)
); 

INSERT INTO tableA (job_id, job_type, employee_list, current_date, salary)
SELECT  10102, 'Non Managerial', 'Steven, Bob', DATE '1999-01-01', 2e4 FROM DUAL UNION ALL
SELECT 102033, 'Managerial', 'David, Charlie, Keren', DATE '1999-01-02', 3e4 FROM DUAL;

After the INSERT then tableb will contain:

JOB_ID	EMPLOYEE	CURRENT_DATE	SALARY
10102	Steven	1999-01-01 00:00:00	20000
10102	Bob	1999-01-01 00:00:00	20000
102033	David	1999-01-02 00:00:00	30000
102033	Charlie	1999-01-02 00:00:00	30000
102033	Keren	1999-01-02 00:00:00	30000

You could also use:

INSERT INTO tableB (job_id, employee, current_date,salary)
SELECT a.job_id,
       TRIM(REGEXP_SUBSTR(a.employee_list, '[^,]+', 1, l.idx)),
       a.current_date,
       a.salary
FROM   tableA a
       CROSS APPLY(
         SELECT LEVEL AS idx
         FROM   DUAL
         CONNECT BY LEVEL <= LENGTH(employee_list) - LENGTH(REPLACE(employee_list,',')) + 1
       ) l
--WHERE  a.job_type = 'Managerial'
--AND    a.current_date = sysdate;

But the recursive query is probably still going to be more efficient (even if it is more to type).

fiddle

It is giving me this error: ORA-01652: unable to extend temp segment by 128 in tablespace TEMP — SandeepChander, Jul 12 '23 at 11:17
@SandeepChander If you search the internet for that error message then you will get plenty of solutions including https://stackoverflow.com/q/25350703/1509264 and https://dba.stackexchange.com/q/75632/118722 — MT0, Jul 12 '23 at 11:21

Store list of elements in multiple rows in plsql from a clob column

1 Answers1