1

Hi I need to write a regex with following logic:

Split at every second comma except if the character '\' is before the comma.

Maybe an example to make it clear:

1,1a,2,2a,3,3a\,b,4,4a

Should get the result:

1,1a

2,2a

3,3a\,b

4,4a

This is my following code:

SELECT REGEXP_SUBSTR (text, '[^,]+,[^,]+', 1, LEVEL) TXT
FROM DUAL
CONNECT BY REGEXP_SUBSTR (text, '[^,]+,[^,]+', 1, LEVEL) IS NOT NULL;

So my regex at the moment is: '[^,]+,[^,]+' which split at every second comma.

ZerOne
  • 1,296
  • 6
  • 20
  • 40
  • I just ran your query, it looks fine to me.. What's the issue? Why does `3,3a\,b` takes 3 commas and not 2 ? – sagi Sep 14 '16 at 14:09
  • Looks \ is escape character in oracle. You need to change your string as `1,1a,2,2a,3,3a\\,b,4,4a` – dinesh Sep 14 '16 at 14:10
  • @sagi Because this could also be a text like "Test\, Test2".. So if a \ is the character before the comma, this comma should be ignored. – ZerOne Sep 14 '16 at 14:12

1 Answers1

1

Try (\\,|[^,])+ instead of plain [^,]+

J Earls
  • 1,792
  • 8
  • 12
  • Thanks works perfect! My regex now is: `(\\,|[^,])+,(\\,|[^,])+` – ZerOne Sep 14 '16 at 14:24
  • FYI - @ZerOne Make sure you always have sets separated by a comma else you will get incorrect results. i.e. if you put an extra comma in between the two 1's, the results will be wrong. This may not be an issue depending on how your your data is scrubbed on the way in. Just making you aware of a possible caveat as the regex format `[^,]+` commonly used for parsing lists does not handle NULL elements. See http://stackoverflow.com/a/31464699/2543416 for more info. – Gary_W Sep 14 '16 at 15:22