1

I have a table

table1

f_name     f_content
test1.txt  YL*1**50*1~
           RX*1~
           LR*2~
test2.txt  YL*1**49*1~
           EE*1~
           WW*2~
  • f_content is CLOB
  • f_name is varchar2 (4000)

I have written this SQL:

SELECT
    d.*,
    c.line_num,
    translate(substr(d.f_content, part1 + 1, part2 - part1), ' ~'
                                                                || CHR(10)
                                                                || CHR(13), ' ') line
FROM
    table1 d
    CROSS JOIN LATERAL (
        SELECT
            level   line_num,
            DECODE(level, 1, 0, regexp_instr(d.f_content, '~', 1, level - 1)) part1,
            DECODE(regexp_instr(d.f_content, '~', 1, level), 0, length(d.f_content), regexp_instr(d.f_content, '~', 1, level
            )) part2
        FROM
            dual
        CONNECT BY
            level <= regexp_count(d.f_content, '~ ')
    ) c;

My expected output is:

f_name     f_content        line_num    line
test1.txt  YL*1**50*1~      1           YL*1**50*1
           RX*1~
           LR*2~
test1.txt  YL*1**50*1~      2           RX*1
           RX*1~
           LR*2~
test1.txt  YL*1**50*1~      3           LR*2
           RX*1~
           LR*2~


test2.txt  YL*1**49*1~      1           YL*1**49*1
           EE*1~
           WW*2~
test2.txt  YL*1**49*1~      2           EE*1
           EE*1~
           WW*2~
test2.txt  YL*1**49*1~      3           WW*2
           EE*1~
           WW*2~

However in the output based on the SQL above, I am only getting line_num = 1.

How can I get the SQL code working so it gives all the lines?

Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55
dang
  • 2,342
  • 5
  • 44
  • 91
  • [Here](https://stackoverflow.com/a/38372145/7998591)'s a good answer with discussions on the performance of various solutions to the problem. Do take a look at other answers as well. If possible compare the performance of those solutions with your data to arrive at a better solution to suit your needs. – Kaushik Nayak May 04 '19 at 15:02
  • @KaushikNayak - I read the above answer. I want to create a new table using this. Is there a way to make execution faster? Add parallel hints? – dang May 04 '19 at 15:55
  • So, you want create a new table to store the values separately using the output of the query or store values separated by `~` in a single column? If the latter is true then I would suggest you not to go ahead with it. If you could control the data at source to make it more structured( to be able to store the values separately ) as relational database rules recommend, you'll be in a better position to optimize queries on any volume of data, given enough resources. Without that, a Hint would do no good as an unstructured data is still harder to be retrieved. – Kaushik Nayak May 04 '19 at 16:12
  • I want to create new table to store the values from the output of the SQL query. – dang May 04 '19 at 16:13
  • Moreover, `CLOB`s are generally slower than normal character types( `VARCHAR2`). So, there is something fundamentally wrong in the way data is stored. As I mentioned, Instead of having entire unstructured data(`f_content`) as a `CLOB` column, you could very well have received it as a standard CSV or XML or JSON formats, with an agreed upon pre-defined structure .You would then have several built-in functions/ APIs to extract the required data efficiently. It is really sub-optimal to store them as `CLOB` and then try to transform it from there. – Kaushik Nayak May 04 '19 at 16:24

1 Answers1

1

You can use a hierarchical query without a JOIN condition :

select t1.*, level as line_num, 
       regexp_replace( regexp_substr( t1.f_content,'[^~]+', 1, level), '(^[[:space:]]+)' ) 
       as line
  from table1 t1
 connect by level <= regexp_count(f_content, '~')   
    and prior f_name = f_name
    and prior sys_guid() is not null

Demo

Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55
  • I have over 30K records with more than 1000 lines in each. Would this query be faster? – dang May 04 '19 at 14:25
  • @dang I didn't test, I don't know dear Dhananjay. – Barbaros Özhan May 04 '19 at 14:26
  • Ok, how can I remove trailing and leading spaces? – dang May 04 '19 at 14:27
  • Also, should I be considering CHR(10) and CHR(13) characters? – dang May 04 '19 at 14:28
  • you can replace `replace(t1.f_content,' ')` with `t1.f_content` inside `regexp_substr` expression. And we don't need CHR(10) or CHR(13). – Barbaros Özhan May 04 '19 at 14:41
  • It will replace all the spaces in the string. I just want to replace trailing and leading spaces. – dang May 04 '19 at 14:42
  • @dang : To replace trailing and leading spaces, simply using `TRIM` function should work. – Kaushik Nayak May 04 '19 at 15:10
  • 1
    @KaushikNayak it's interesting but TRIM doesn't work even used as `trim(leading ' ' from t1.f_content)`. Because of this, I considered `regexp_replace( ... , '(^[[:space:]]+)' )` – Barbaros Özhan May 04 '19 at 15:14
  • Well, then it's not a space character, should be something else. – Kaushik Nayak May 04 '19 at 15:17
  • Can I add parallel hints OR partitioning to speed this up? – dang May 04 '19 at 15:53
  • I'm basically doing a create table as and it is taking long time to write it. Not sure if there is a way to improve performance. – dang May 04 '19 at 15:55
  • @dang do you want to make spesific text search inside CLOB, why do you need performance? – Barbaros Özhan May 04 '19 at 16:06
  • @BarbarosÖzhan - I am splitting CLOB to multiple rows separated by *. For ex. ABC*01*34 need to become 3 rows with values - ABC, 01, 34. So, first step is breaking it down by lines and I want to store it in a table. Then the next step is to break those lines into separate rows. While writing to separate lines using the SQL query as per above, it is taking long time to create table. So, I am wondering if there is a way to write faster. – dang May 04 '19 at 16:08
  • @dang perhaps, creating an `ctxsys.context` type index on `f_content` column to improve query performance. – Barbaros Özhan May 04 '19 at 16:15
  • Can you tell me how to create this index? – dang May 04 '19 at 16:17
  • @dang `create index idx_tab1_fcontent on table1(f_content) indextype is ctxsys.context` – Barbaros Özhan May 04 '19 at 16:20
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/192841/discussion-between-barbaros-ozhan-and-dang). – Barbaros Özhan May 04 '19 at 17:43