0

I need to split a string like this

RANDOM(NUL)THIS_SHOULD_BE_SPLIT(NUL)~THIS_IS_NEW(NUL)STRING(NUL)~THIS_IS_ANOTHER_STRING(NUL)

(NUL) means a character with ASCII value 0. Don't know how to type it here.

I need to split this when (NUL)~ occurs.

So the expected result after split is

  1. RANDOM(NUL)THIS_SHOULD_BE_SPLIT
  2. THIS_IS_NEW(NUL)STRING
  3. THIS_IS_ANOTHER_STRING(NUL)

Using OracleDB not sure how it handles NULL characters and if it is possible to split it.

To split it using ~ I am using

SELECT REGEXP_SUBSTR(stringval,'[^~]+', 1, LEVEL) AS error_code
FROM   DUAL
CONNECT BY REGEXP_SUBSTR(stringval, '[^~]+',1,LEVEL) IS NOT NULL;

Have no idea on how to incorporate NULL character here.

MT0
  • 143,790
  • 11
  • 59
  • 117
Mike Reddington
  • 192
  • 2
  • 15

2 Answers2

1

You cannot directly match the ASCII 0 (NUL) character in a regular expression as the regular expression parser will treat an ASCII 0 (NULL) character as a string terminator and end the regular expression pattern rather than using it as a character to match. So, if you want to use regular expressions, you will need to replace all occurrences of the ASCII 0 (NUL) character with a placeholder string and then match that when splitting and then restore the ASCII 0 (NUL) characters that were not split upon afterwards. But you will need to ensure that your placeholder is never going to occur elsewhere in your string.

Instead, you can use CHR(0) to match the NUL character. This method uses simple string functions (rather than slow regular expressions that cannot directly handle ASCII 0 (NUL) characters) and can handle multiple input rows:

WITH data ( value ) AS (
  SELECT 'RANDOM' || CHR(0)
         || 'THIS_SHOULD_BE_SPLIT' || CHR(0)
         || '~THIS_IS_NEW' || CHR(0)
         || 'STRING' || CHR(0)
         || '~THIS_IS_ANOTHER_STRING' || CHR(0)
  FROM   DUAL
UNION ALL
  SELECT '12345' || CHR(0)
         || '67890' || CHR(0)
         || '~23456'
  FROM   DUAL
),
split_positions ( value, start_pos, end_pos ) AS (
  SELECT value, 1, INSTR( value, CHR(0) || '~' )
  FROM   data
UNION ALL
  SELECT value, end_pos + 2, INSTR( value, CHR(0) || '~', end_pos + 2 )
  FROM   split_positions
  WHERE  end_pos > 0
)
SELECT CASE end_pos
       WHEN 0
       THEN SUBSTR( value, start_pos )
       ELSE SUBSTR( value, start_pos, end_pos - start_pos )
       END AS value
FROM   split_positions;

Which outputs:

| VALUE                           |
| :------------------------------ |
| RANDOM(NUL)THIS_SHOULD_BE_SPLIT |
| 12345(NUL)67890                 |
| THIS_IS_NEW(NUL)STRING          |
| 23456                           |
| THIS_IS_ANOTHER_STRING(NUL)     |

(Note: the NUL character has again been replaced by (NUL) string in the output as db<>fiddle did not like displaying that character.)

db<>fiddle here

MT0
  • 143,790
  • 11
  • 59
  • 117
  • Please explain the query. @MTO – Mike Reddington May 28 '20 at 11:47
  • @Akshith The first sub-query factoring clause (`WITH data .... `) has the input strings. The second sub-query factoring clause (`split_positions ....`) is a recursive query that starts finding the first `(NUL)~` and then iterates finding successive positions. The final `SELECT ...` statement just outputs the sub-strings between the positions that the previous recursive SQFC has found. – MT0 May 28 '20 at 11:51
  • @Akshith Also added a section on why it won't directly work with regular expressions due to how the underlying parser expects `ASCII 0 (NUL)` characters to be treated. – MT0 May 28 '20 at 12:01
0

Would something like this be ok for you:

SELECT REGEXP_SUBSTR(replace(stringval, '(NUL)~', '~'),'[^~]+', 1, LEVEL) 
       AS error_code 
FROM test
CONNECT BY REGEXP_SUBSTR(stringval, '[^~]+',1,LEVEL) IS NOT NULL;

here is a demo

VBoka
  • 8,995
  • 3
  • 16
  • 24
  • No, as there might be other ~ which are not part of (NUL)~ – Mike Reddington May 28 '20 at 11:27
  • By the way "(NUL)" is not a string. It is the character with ASCII value 0. I don't know how to represent it here or in SQL so I used (NUL) like a placeholder. – Mike Reddington May 28 '20 at 11:37
  • Ok but then you can just replace "(NUL)~" with something special like -xxx- and then use -xxx- as separator... I understand that the "(NULL)" is a placeholder... – VBoka May 28 '20 at 11:45