1

I have attempted to use this question here Splitting string into multiple rows in Oracle and adjust it to my needs however I'm not very confident with regex and have not been able to solve it via searching.

Currently that questions answers it with a lot of regex_substr and so on, using [^,]+ as the pattern so it splits by a single comma. I need it to split by a multi-character delimiter (e.g. #;) but that regex pattern matches any single character to split it out so where there are #s or ;s elsewhere in the text this causes a split.

I've worked out the pattern (#;+) will match every group of #; but I cannot workout how to invert this as done above to split the row into multiple.

I'm sure I'm just missing something simple so any help would be greatly appreciated!

Ryan Hill
  • 136
  • 1
  • 8
  • As a workaround, you may replace `#;` with some char that is not used in the actual data (usually, some control char is used for this), and then use the solution you linked to in your question. – Wiktor Stribiżew Jul 22 '18 at 13:25
  • Thanks @WiktorStribiżew , that's what I'm doing at the moment to workaround it, just trying to streamline it anyway I can to improve performance on large datasets. – Ryan Hill Jul 22 '18 at 13:40

1 Answers1

0

I think you should use:

[^#;+]+

instead of

(#;+)

As, it will be checking for any one of the characters in the range which can be # ; or + and then you can split accordingly.

You can change it according to your requirement but in the regex I shared, I am consudering # , ; and + as delimeter

So, in end, the query would look something like this:

with tbl(str) as (
  select ' My, Delimiter# Hello My; Delimiter World My Delimiter  My Delimiter test My Delimiter ' from dual
)
SELECT LEVEL AS element,
REGEXP_SUBSTR( str ,'([^#;+]+)', 1, LEVEL, NULL, 1 ) AS element_value
FROM   tbl
CONNECT BY LEVEL <= regexp_count(str, '[#;+]')+1\\

Output:

ELEMENT ELEMENT_VALUE
1       My, Delimiter
2       Hello My
3       Delimiter World My Delimiter  My Delimiter test My Deli

-- EDIT --

In case you want to check unlimited numbers of # or ; to split and don't want to split at one existence, I found the below regex, but again that is not supported by Oracle.

(?:(?:(?![;#]+).#(?![;#]+).|(?![;#]+).;(?![;#]+).|(?![;#]+).)*)+

So, I found no easy apart from below query which will not split on single existence if there is only one such instance between two delimeters:

  select ' My, Delimiter;# Hello My Delimiter ;;# World My Delimiter ;  My Delimiter test#; My Delimiter ' from dual
)
SELECT LEVEL AS element,
REGEXP_SUBSTR( str ,'([^#;]+#?[^#;]+;?[^#;]+)', 1, LEVEL, NULL, 1 ) AS element_value
FROM   tbl
CONNECT BY LEVEL <= regexp_count(str, '[#;]{2,}')+1\\

Output:

ELEMENT ELEMENT_VALUE
1    My, Delimiter
2    Hello My Delimiter 
3    World My Delimiter ;  My Delimiter test
4    My Delimiter 
Aman Chhabra
  • 3,824
  • 1
  • 23
  • 39
  • Unfortunately it's a single multicharcter delimiter #; rather than multiple single character delimiters # or ; . The + in the pattern I gave was just to match unlimited numbers of those characters. I don't want it to split for a single # or ; – Ryan Hill Jul 22 '18 at 17:22
  • @RyanHill I have updated my answer... Hope that will help – Aman Chhabra Jul 23 '18 at 02:24