converting the data with regexp in oracle sql

Question

I have a data like below with tab limited among them. I have represented them with a view here

with t_view as (select '6-21  6-21  6-21  6-21  6-21  6-21  6-21  ' as col from dual
union
select '6-20  6-20  6-20  6-20  6-20  ' from dual
union
select '6-9  6-9  6-9  6-9  6-9  6-9  6-9  ' from dual)

My expected output is

Mon: 6-21, Tue: 6-21, Wed: 6-21, Thu: 6-21, Fri: 6-21, Sat: 6-21, Sun: 6-21
Mon: 6-20, Tue: 6-20, Wed: 6-20, Thu: 6-20, Fri: 6-20
Mon: 6-9, Tue: 6-9, Wed: 6-9, Thu: 6-9, Fri: 6-9, Sat: 6-9, Sun: 6-9

I thought of replacing all those horizontal table with some unique patterns like this and then replace that pattern with Mon, Tue based on indexing

$1(6-20 )$2(6-20 )$3(6-20 )$4(6-20 )$5(6-20 )

I have tried the below query but could not complete it

select regexp_replace(col, '([[:digit:]]-[[:digit:]]{2}[[:space:]]+)','$(\1)') from t_view;

You cannot use one regex replacement here since it requires 2 different replacement strings (as the number of days is not constant). — Wiktor Stribiżew, Sep 07 '15 at 08:48

Lalit Kumar B · Answer 1 · 2015-09-07T11:23:15.693

3

You need a combination of CASE expression, REGEXP_COUNT and REGEXP_REPLACE since you do not have the same expression for all the rows. Depending on the data, you could have as many conditions in the case expression.

The regular expression pattern is (\d-\d+ ).

For example,

SQL> WITH t_view(col) AS
  2    ( SELECT '6-21  6-21  6-21  6-21  6-21  6-21  6-21  ' FROM dual
  3    UNION
  4    SELECT '6-20  6-20  6-20  6-20  6-20  ' FROM dual
  5    UNION
  6    SELECT '6-9  6-9  6-9  6-9  6-9  6-9  6-9  ' FROM dual
  7    )
  8  SELECT REPLACE(new_col, '  ','') new_col
  9  FROM (
 10    SELECT
 11      CASE
 12        WHEN regexp_count(col, '\d+\-\d+') = 5
 13        THEN regexp_replace(col,
 14                            '(\d-\d+  )(\d-\d+  )(\d-\d+  )(\d-\d+  )(\d-\d+  )',
 15                            'Mon: \1, Tue: \2,Wed: \3,Thu: \4,Fri: \5')
 16        WHEN regexp_count(col, '\d+\-\d+') = 7
 17        THEN regexp_replace(col,
 18                            '(\d-\d+  )(\d-\d+  )(\d-\d+  )(\d-\d+  )(\d-\d+  )(\d-\d+  )(\d-\d+  )',
 19                            'Mon: \1, Tue: \2,Wed: \3,Thu: \4,Fri: \5,Sat: \6,Sun: \7')
 20      END new_col
 21    FROM t_view
 22  );

NEW_COL
----------------------------------------------------------------------------------------------------
Mon: 6-20, Tue: 6-20,Wed: 6-20,Thu: 6-20,Fri: 6-20
Mon: 6-21, Tue: 6-21,Wed: 6-21,Thu: 6-21,Fri: 6-21,Sat: 6-21,Sun: 6-21
Mon: 6-9, Tue: 6-9,Wed: 6-9,Thu: 6-9,Fri: 6-9,Sat: 6-9,Sun: 6-9

SQL>

edited Sep 07 '15 at 11:23

answered Sep 07 '15 at 08:58

Lalit Kumar B

47,486
13
97
124

What happens if there are 1, 2 or 20 values in an input? – MT0 Sep 07 '15 at 09:08
1

@MT0 I already explained that in my answer. The expression is not same for all rows, so **CASE** could be used. But tell me, which calendar has more than 7 days in a week? Logically, OP's question is concatenating the Day name to each pattern in the regular expression. I know it could also be done using [**row generator method and string split**](http://lalitkumarb.com/2015/03/04/split-comma-delimited-strings-in-a-table-in-oracle/). But I focused on **`regexp_replace`** as OP mentioned it in the question. – Lalit Kumar B Sep 07 '15 at 09:12
If there are 10 days in the input then you want a full week plus 3 additional days of the next week - a point you seem to have missed when you bring up different calendars. With this method you are going to have to hand code a `CASE` for each potential length of input - and if you miss one then you will get `NULL` instead of the expected output. Also - you have whitespace before the commas. – MT0 Sep 07 '15 at 09:18
@MTO The whitespace before the commas...hmm...replaced ;-) – Lalit Kumar B Sep 07 '15 at 09:20
This technique won't scale to more than 9 days - as Oracle can only do single digit references to regular expression capture groups. Also, while you've edited the output to fix the whitespace the SQL in the `REGEXP_REPLACE` has not changed so does not produce your current output. – MT0 Sep 07 '15 at 09:50
@MTO Oh come on! You just thought that I simply manipulated the output? I used `REPLACE` function, re-executed the query and edited my answer. You are free to copy the code and verify at your end. – Lalit Kumar B Sep 07 '15 at 09:57
My apologies - you have a nested `REPLACE` which I hadn't seen - you can eliminate it by moving the brackets on the capture groups from `(\d-\d+ )` (which will include the trailing space in the capture group) to `(\d-\d+) `. – MT0 Sep 07 '15 at 10:08

MT0 · Answer 2 · 2015-09-07T09:23:47.690

2

SQL Fiddle

Oracle 11g R2 Schema Setup:

Query 1:

with t_view ( col ) as (
      select '6-21  6-21  6-21  6-21  6-21  6-21  6-21  ' from dual
union select '6-20  6-20  6-20  6-20  6-20  ' from dual
union select '6-9  6-9  6-9  6-9  6-9  6-9  6-9  6-9  6-9' from dual
union select '6-1' from dual
union select '6-1 6-2' from dual
),
days ( id, day ) AS (
            SELECT 1, 'Mon' FROM DUAL
  UNION ALL SELECT 2, 'Tue' FROM DUAL
  UNION ALL SELECT 3, 'Wed' FROM DUAL
  UNION ALL SELECT 4, 'Thu' FROM DUAL
  UNION ALL SELECT 5, 'Fri' FROM DUAL
  UNION ALL SELECT 6, 'Sat' FROM DUAL
  UNION ALL SELECT 0, 'Sun' FROM DUAL
),
matches ( col, idx, day ) AS (
  SELECT col,
         COLUMN_VALUE,
         day || ': ' || REGEXP_SUBSTR( t.col, '\d+-\d+', 1, COLUMN_VALUE )
  FROM   t_view t,
         TABLE(
           CAST(
             MULTISET(
               SELECT LEVEL
               FROM   DUAL
               CONNECT BY LEVEL <= REGEXP_COUNT( t.col, '\d+-\d+' )
             )
             AS SYS.ODCINUMBERLIST
           )
         ) l
         INNER JOIN days d
         ON ( MOD( l.COLUMN_VALUE, 7 ) = d.id )
)
SELECT LISTAGG( day, ', ' ) WITHIN GROUP ( ORDER BY IDX ) AS col
FROM   matches
GROUP BY col

Results:

|                                                                                      COL |
|------------------------------------------------------------------------------------------|
|                                                                                 Mon: 6-1 |
|                                                                       Mon: 6-1, Tue: 6-2 |
|                                    Mon: 6-20, Tue: 6-20, Wed: 6-20, Thu: 6-20, Fri: 6-20 |
|              Mon: 6-21, Tue: 6-21, Wed: 6-21, Thu: 6-21, Fri: 6-21, Sat: 6-21, Sun: 6-21 |
| Mon: 6-9, Tue: 6-9, Wed: 6-9, Thu: 6-9, Fri: 6-9, Sat: 6-9, Sun: 6-9, Mon: 6-9, Tue: 6-9 |

edited Sep 07 '15 at 09:23

answered Sep 07 '15 at 09:02

MT0

143,790
11
59
117

`LISTAGG` is only supported on `11gR2` and up. – Lalit Kumar B Sep 07 '15 at 09:13
There are multiple [other string aggregation techniques](https://oracle-base.com/articles/misc/string-aggregation-techniques) that can be used in earlier versions (if `LISTAGG` is not available). – MT0 Sep 07 '15 at 09:22
@MTO From performance point of view, your solution will be extremely slow. Compare the explain plans. – Lalit Kumar B Sep 07 '15 at 09:25
@LalitKumarB This will handle *ANY* length of input - if you need flexibility then you will need to do something like this (or a custom function) - if you only want a fixed set of inputs then you can go with a more inflexible solution which will, of course, be more performant as it has to cope with less. – MT0 Sep 07 '15 at 09:31
@MTO a hundred lines of more code is better than spending a 100 seconds more :-) Given that the data scalability is known in advance. – Lalit Kumar B Sep 07 '15 at 09:34
@LalitKumarB If you run the SQLFIDDLE linked in the question you will see that it returns in (for me) between 1 and 5ms. – MT0 Sep 07 '15 at 09:54
@MTO And you mean that performance could be measured with 5 rows of input data? – Lalit Kumar B Sep 07 '15 at 09:59
@MTO In any case, it is up to the OP to chose what he wants. We have helped with different approaches and techniques. – Lalit Kumar B Sep 07 '15 at 10:00

Ilia Maskov · Answer 3 · 2015-09-07T11:37:09.330

1

Why can't we use this simple way? Looks good as for me

SELECT 'Mon: '||regexp_substr(col,'\d+\-\d+',1,1) ||
       ', Tue: '||regexp_substr(col,'\d+\-\d+',1,2) ||
       ', Wed: '||regexp_substr(col,'\d+\-\d+',1,3) ||
       ', Thu: '||regexp_substr(col,'\d+\-\d+',1,4) ||
       ', Fri: '||regexp_substr(col,'\d+\-\d+',1,5) ||  
       ', Sat: '||regexp_substr(col,'\d+\-\d+',1,6) ||
       ', Sun: '||regexp_substr(col,'\d+\-\d+',1,7)
 FROM t_view

Obviously it's easy to eliminate empty Sat Sun, for example with nvl2:

SELECT 'Mon: '||regexp_substr(col,'\d+\-\d+',1,1) ||
       ', Tue: '||regexp_substr(col,'\d+\-\d+',1,2) ||
       ', Wed: '||regexp_substr(col,'\d+\-\d+',1,3) ||
       ', Thu: '||regexp_substr(col,'\d+\-\d+',1,4) ||
       ', Fri: '||regexp_substr(col,'\d+\-\d+',1,5) ||
       nvl2(regexp_substr(col,'\d+\-\d+',1,6), 
       ', Sat: '||regexp_substr(col,'\d+\-\d+',1,6) ||
       ', Sun: '||regexp_substr(col,'\d+\-\d+',1,7),null)
 FROM t_view

You should keep in mind, that this is just example, and if you can get data with any number of day missed you should use nvl2 in more places

edited Sep 07 '15 at 11:37

answered Sep 07 '15 at 08:54

Ilia Maskov

1,858
16
26

That will leave the extra concatenated values where the number of values are less than 7. It could be handled using case expression. – Lalit Kumar B Sep 07 '15 at 09:05
This will include `Sat` and `Sun` even if there are not enough values in the input to include them. – MT0 Sep 07 '15 at 09:07
I think we can avoid this by adding the replace function above the result we got and replace the strings that dont have any digits associated with it, in our case it will be Sat and Sun – arunb2w Sep 07 '15 at 11:23
@arunb2w, obviously it's easy to eliminate empty Sat Sun, as you've said with replace or with nvl2 wich simpler as for me, so the rest of query will be (i've added to the answer) – Ilia Maskov Sep 07 '15 at 11:37

Maheswaran Ravisankar · Answer 4 · 2015-09-07T10:58:24.437

Considering Space as delimiter, tokenise into rows(using levels) and rejoin using LISTAGG() , using the level as day generator (TO_CHAR(TRUNC(SYSDATE,'D')+level)

with t_view as
(
  select '6-21  6-21  6-21  6-21  6-21  6-21  6-21  ' as col from dual
  union all
  select '6-20  6-20  6-20  6-20  6-20  ' from dual
  union all
  select '6-9  6-9  6-9  6-9  6-9  6-9  6-9 6-9 ' from dual
)
SELECT LISTAGG(TO_CHAR(TRUNC(SYSDATE,'D')+level1,'Dy')||': '||
               REGEXP_SUBSTR(col,'[^ ]+',1,LEVEL1),', ')
        WITHIN GROUP (ORDER BY level1 )
from
(
SELECT  col,level level1
  FROM t_view
CONNECT BY REGEXP_SUBSTR(col,'[^ ]+',1,LEVEL) IS NOT NULL
   AND PRIOR col = col
   AND PRIOR sys_guid() IS NOT NULL
)
group by col;

score 1 · Answer 5 · answered Sep 07 '15 at 11:13

This is my try. It is not different too much from MTo's version. The ideea is the same: transform strings into lines, add information about day, then regroup the record.

with week as (
  select 1 day_num, 'Mon' day_name from dual union all
  select 2 day_num, 'Tue' day_name from dual union all
  select 3 day_num, 'Wed' day_name from dual union all
  select 4 day_num, 'Thu' day_name from dual union all
  select 5 day_num, 'Fri' day_name from dual union all
  select 6 day_num, 'Sat' day_name from dual union all
  select 7 day_num, 'Sun' day_name from dual
),
t_view as (select '6-21  6-21  6-21  6-21  6-21  6-21  6-21  ' as col from dual
                union all
                select '6-20  6-20  6-20  6-20  6-20  ' from dual
                union all
                select '6-9  6-9  6-9  6-9  6-9  6-9  6-9  ' from dual
                ),
lines as(
  select 
    col, WEEK.DAY_NAME, l, trim(regexp_substr(col, '[^,]+', 1, L)) elem
  from (
      select regexp_replace(col,'([[:digit:]]-[[:digit:]]{1,2}[[:space:]]+)','\1,')  col
      from t_view
      )
      join (select level l from dual connect by level < 10)
  on instr(col, ',', 1, L ) > 0
  join week on WEEK.DAY_NUM = l
  order by col,l
  )
select listagg(day_name||':'||elem,' ')  within group (order by l)
from lines
group by col;

Result:

Mon:6-20 Tue:6-20 Wed:6-20 Thu:6-20 Fri:6-20
Mon:6-21 Tue:6-21 Wed:6-21 Thu:6-21 Fri:6-21 Sat:6-21 Sun:6-21
Mon:6-9 Tue:6-9 Wed:6-9 Thu:6-9 Fri:6-9 Sat:6-9 Sun:6-9

converting the data with regexp in oracle sql

5 Answers5