0

I want to split row each column data based on delimiter and show in multiple rows. I need fastest way to achieve this. My schema and current query to split records as below:

CREATE TABLE APP_SPECS
  (
    SPEC_ID          NUMBER PRIMARY KEY,
    SPEC_NAME        VARCHAR2(4000),
    SPEC_DESCRIPTION VARCHAR2(4000),
    SPEC_TYPE        VARCHAR2(4000)
  );
/
INSERT INTO APP_SPECS VALUES (1, 'SPEC 1' || CHR(10) || 'SPEC 2', 'SPEC DESC' || CHR(10) || 'SPEC DESC', 'TYPE 1' || CHR(10) || 'TYPE 2');
/
INSERT INTO APP_SPECS VALUES (2, 'SPEC 3' || CHR(10) || 'SPEC 4', 'SPEC DESC 3' || CHR(10) || 'SPEC DESC 4', 'TYPE 3' || CHR(10) || 'TYPE 4');
/
INSERT INTO APP_SPECS VALUES (3, 'SPEC 5' || CHR(10) || 'SPEC 6', CHR(10) || 'SPEC DESC 6', 'TYPE 5' || CHR(10) || 'TYPE 6');
/
INSERT INTO APP_SPECS VALUES (4, 'SPEC 7' || CHR(10) || 'SPEC 8' || CHR(10) || 'SPEC 9', 'SPEC DESC 7', 'TYPE 7' || CHR(10) || 'TYPE 8');
/
COMMIT;
/

Query to split using table:

WITH APP_SPECS_CTE AS
  (SELECT REGEXP_SUBSTR(REPLACE(SPEC.SPEC_NAME,CHR(10),','), '[^,]+', 1, LEVEL) SPEC_NAME ,
    REGEXP_SUBSTR(REPLACE(SPEC.SPEC_DESCRIPTION,CHR(10),','), '[^,]+', 1, LEVEL) SPEC_DESCRIPTION ,
    REGEXP_SUBSTR(REPLACE(SPEC.SPEC_TYPE,CHR(10),','), '[^,]+', 1, LEVEL) SPEC_TYPE,
    SPEC_ID
  FROM APP_SPECS SPEC
    CONNECT BY LEVEL          <= LENGTH(REPLACE(SPEC.SPEC_NAME,CHR(10),',')) - LENGTH(REPLACE(REPLACE(SPEC.SPEC_NAME,CHR(10),','), ',')) + 1
  AND PRIOR SPEC_ID            = SPEC_ID
  AND PRIOR DBMS_RANDOM.VALUE IS NOT NULL
  )
SELECT SPEC_ID, SPEC_NAME, SPEC_DESCRIPTION, SPEC_TYPE FROM APP_SPECS_CTE;

using above query we will get below output.

| SPEC_ID | SPEC_NAME | SPEC_DESCRIPTION | SPEC_TYPE |
|---------|-----------|------------------|-----------|
| 1       | SPEC 1    | SPEC DESC        | TYPE 1    |
| 1       | SPEC 2    | SPEC DESC        | TYPE 2    |
| 2       | SPEC 3    | SPEC DESC 3      | TYPE 3    |
| 2       | SPEC 4    | SPEC DESC 4      | TYPE 4    |
| 3       | SPEC 5    | SPEC DESC 6      | TYPE 5    |
| 3       | SPEC 6    | null             | TYPE 6    |
| 4       | SPEC 7    | SPEC DESC 7      | TYPE 7    |
| 4       | SPEC 8    | null             | TYPE 8    |
| 4       | SPEC 9    | null             | null      |

I have 3 Million records in my APP_SPEC table. when i execute it's taking 5+ mins. Can any one check my query and correct me if any thing wrong.

Oracle Instance Details: Octa core processor, 64 GB RAM.

APC
  • 144,005
  • 19
  • 170
  • 281
swaroop pallapothu
  • 588
  • 1
  • 6
  • 15
  • 2
    A better solution is to change your table design. Avoid storing delimited values in a single column. Relational databases like Oracle are not meant for that and it causes unnecessary problems. Read this answer https://stackoverflow.com/a/3653574 . I know there could be legacy reasons for doing what you are currently doing. But, you should seek out ways to change it as far as you can. – Kaushik Nayak Sep 21 '18 at 07:09
  • 2
    Performance issues are one penalty for having a data model which is not in [First Normal Form](https://en.wikipedia.org/wiki/First_normal_form). – APC Sep 21 '18 at 07:19
  • 1
    Regex is slow (because it has to crunch the whole column to apply the pattern) and you've got three of them. So, why five minutes may see like a long time it's not an unreasonable amount of time to run a CPU intensive process over 3m rows. What sort of timeframe where you hoping for? – APC Sep 21 '18 at 07:22
  • @APC I want query to execute less than 10s – swaroop pallapothu Sep 21 '18 at 09:06
  • @KaushikNayak, I need to show that in UI with out splitting. Like excel data. Also, It was done a while back(may be 10 years). It takes more time to refactor in my application as per your suggestion. Do i have any alternatives? – swaroop pallapothu Sep 21 '18 at 09:08
  • "when i execute it's taking 5+ mins" Are you trying to select the entire table? Or just for particular spec_ids? – tbone Sep 21 '18 at 13:38
  • Notice the query is giving you bad results due to the regex you are using to parse the delimited string. The regex `[^,]+` does not handle NULL elements. See your output, the first row for spec_id 3 where desc 6 is in spec 5's place. Use this set of args instead to regexp_substr: `'(.*?)(,|$)', 1, LEVEL, NULL, 1`. See here for more info: https://stackoverflow.com/a/31464699/2543416. This is actually a great example of why not to use `[^,]+` – Gary_W Sep 21 '18 at 15:35

1 Answers1

0

I want query to execute less than 10s.

Querying 3 million records in less than ten seconds is pretty challenging on its own. Even before we consider the CPU-intensive regex processing. Displaying that number of rows in a UI? Not possible. But actually it's multiples of 3m because you're splitting each record into several rows. So really not possible.

But what is the real requirement here? Do your users really want to see all three million rows every time? Or are they really just going to be querying a small subset of rows, e.g. for a SPEC_ID? Because that should be possible in under ten seconds.

But if you do have to query the whole table there's no way to do that in much less time than it takes already. You will have to re-engineer your data model into First Normal Form. This may not require as much effort as you think: you can use a view to maintain the current projection over a better data model, for as long as it takes to refactor the UI.

APC
  • 144,005
  • 19
  • 170
  • 281