2

I'm splitting a string p_value using p_delimeter, which may contain one or more symbols (that's why regexp is not like often used [^,]+).

In most of the cases the following query works predictably, but I'm dazed with a case when:

  1. string p_value contains line break chr(10),
  2. p_value doesn't contain p_delimeter as substring,

so I expect to have one row with the whole p_value as the result, but got only the remainder after line break.

It's supposed here that regexp treats line break as ordinary symbol, since the 'm' modifier is absent in call to regexp_substr.

Please, explain is this behavior correct and how to get the expected result.


WITH 
  params AS (SELECT 'ab' || chr(10) || 'cd' p_value,
                    'xxx' p_delimeter
               FROM dual
  )          
SELECT regexp_substr(p_value, '(.*?)(' || p_delimeter || '|$)', 1, level, 'c', 1) AS CUT
  FROM params
CONNECT BY regexp_substr(p_value, '(.*?)(' || p_delimeter || '|$)', 1, level, 'c', 1) IS NOT NULL;


Actual result:      Expected result:
-----               ------
 CUT                 CUT
-----               ------ 
 cd                  ab/cd 
                       ^
                       'this is just a marker for a line break [= chr(10)]'
diziaq
  • 6,881
  • 16
  • 54
  • 96
  • Line breaks are matched with `.` when you pass `n` modifier, use `'cn'`. *If the `n` flag is set, it matches the newline character. The newline is recognized as the linefeed character (`\x0a`) on UNIX and Windows or the carriage return character (`\x0d`) on Macintosh platforms*. See [docs](https://docs.oracle.com/cd/B19306_01/appdev.102/b14251/adfns_regexp.htm). – Wiktor Stribiżew Mar 03 '16 at 09:18
  • @Wiktor, thank you, it's useful. But if we add `'n'` modifier, then line break cannot be used as `p_delimeter`. – diziaq Mar 03 '16 at 09:24
  • A line feed can be used as a delimiter since you are using a lazy quantifier `*?` with `.`. – Wiktor Stribiżew Mar 03 '16 at 09:28
  • FYI - You mention the oft-used regex pattern of `[^,]+` to split delimited strings. It fails if there is a NULL in the list and it should be avoided. See here for proof and an alternative regex: http://stackoverflow.com/questions/31464275/split-comma-separated-values-to-columns-in-oracle/31464699#31464699 – Gary_W Mar 03 '16 at 14:36

1 Answers1

2

Allow the . pattern to match all characters by adding the n flag to the regular expression:

WITH params ( p_value, p_delimiter ) AS (
  SELECT 'ab' || chr(10) || 'cd', 'xxx' FROM dual
)          
SELECT REGEXP_SUBSTR(p_value, '(.*?)(' || p_delimeter || '|$)', 1, level, 'cn', 1) AS CUT
  FROM params
CONNECT BY LEVEL < REGEXP_COUNT( p_value, '(.*?)(' || p_delimeter || '|$)' );

or you can use a simple function:

Oracle Setup:

CREATE TYPE VARCHAR2_TABLE AS TABLE OF VARCHAR2(4000);
/

CREATE OR REPLACE FUNCTION split_String(
  i_str    IN  VARCHAR2,
  i_delim  IN  VARCHAR2 DEFAULT ','
) RETURN VARCHAR2_TABLE DETERMINISTIC
AS
  p_result       VARCHAR2_TABLE := VARCHAR2_TABLE();
  p_start        NUMBER(5) := 1;
  p_end          NUMBER(5);
  c_len CONSTANT NUMBER(5) := LENGTH( i_str );
  c_ld  CONSTANT NUMBER(5) := LENGTH( i_delim );
BEGIN
  IF c_len > 0 THEN
    p_end := INSTR( i_str, i_delim, p_start );
    WHILE p_end > 0 LOOP
      p_result.EXTEND;
      p_result( p_result.COUNT ) := SUBSTR( i_str, p_start, p_end - p_start );
      p_start := p_end + c_ld;
      p_end := INSTR( i_str, i_delim, p_start );
    END LOOP;
    IF p_start <= c_len + 1 THEN
      p_result.EXTEND;
      p_result( p_result.COUNT ) := SUBSTR( i_str, p_start, c_len - p_start + 1 );
    END IF;
  END IF;
  RETURN p_result;
END;
/

Query:

WITH params ( p_value, p_delimiter ) AS (
  SELECT 'ab' || chr(10) || 'cd', 'xxx' FROM dual
)          
SELECT COLUMN_VALUE AS CUT
FROM   params,
       TABLE( split_String( p_value, p_delimiter ) );
MT0
  • 143,790
  • 11
  • 59
  • 117