0

I have a string that contains multiple substrings that are separated by a delimiter character. substr1#substr2#substr3...#substrN

I want to query for all the values in a column that are also in this string.

What I have so far is:

SELECT substring_col
FROM table
WHERE id IN SUBSTR(str_to_search,INSTR(str_to_search,substring_col),INSTR(str_to_search,'#',INSTR(str_to_search,substring_col))-1)

However, this only returns the first substring that is found. How can I make it return all substrings that are found?

casey
  • 196
  • 10
  • 1
    Could you share an example? This sounds easy but somehow I want to confirm if I am getting what you are asking. – Ankur Sinha Jun 29 '18 at 14:33
  • Probably duplicate of https://stackoverflow.com/questions/51045592/extract-nth-substring/51046010#51046010 – wolφi Jun 29 '18 at 15:50

3 Answers3

1

Awful data format, and there are a lot of reasons why you should change it.

But, sometimes we are stuck with other people's really, really bad formats. One method is to use like:

where '#' || listcol || '#' like '%#' || id || '#%'
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

I'm afraid that regex of the form '[^#]+' does not handle NULL elements. Unfortunately it's the most common answer for questions on parsing delimited strings. For proof and details see post: https://stackoverflow.com/a/31464699/2543416. Using it, the data set with a NULL element 2 gives the following result set:

SUBS
-----------
substr1
substr3
substrN
<NULL here>

SQL>

Instead use this form built on Littlefoot's answer (Note element 2 is NULL):

with test as (select 'substr1##substr3#substrN' col from dual)
    select regexp_substr(col, '(.*?)(#|$)', 1, level, NULL, 1) subs
    from test
    connect by regexp_substr(col, '(.*?)(#|$)', 1, level) is not null;

SUBS
-----------
substr1

substr3
substrN

SQL>

Here the 2nd element's NULL is preserved and the remaining values are in the correct position.

For your case, you may not care about the position of the value, just that it is in the list. But, for re usability (and for accuracy), you could turn this into a function where you pass in the string, the delimiter, and the value you are after and have it return it's position. Non-zero means it is in the list and also you have it's position if that is ever needed. Just a thought.

Gary_W
  • 9,933
  • 1
  • 22
  • 40
0

If I understood you correctly, splitting that long delimited string might be what you're looking for.

Here's how:

SQL> with test as (select 'substr1#substr2#substr3#substrN' col from dual)
  2  select regexp_substr(col, '[^#]+', 1, level) subs
  3  from test
  4  connect by level <= regexp_count(col, '#') + 1;

SUBS
--------------------------------------------------------------------------------
substr1
substr2
substr3
substrN

SQL>

It means that your query might look like this:

SELECT substring_col
FROM table
WHERE id IN (SELECT regexp_substr('substr1#substr2#substr3#substrN', '[^#]+', 1, level) subs
             FROM dual
             CONNECT BY level <= regexp_count('substr1#substr2#substr3#substrN', '#') + 1
            );

The delimited string is probably a parameter; I guess you can rewrite the above code in that manner.

Littlefoot
  • 131,892
  • 15
  • 35
  • 57