0

I want to use a list of substrings as the values for my LIKE clause.

Consider the following tables:

Strains (s)

id    name
-------------------------------------------------------
562   B6;129 (Bnip3 KO)
563   B6;129 (BNIP3 Wt) [pregnant]
564   B6;129 (BNIP3 Wt) [older than 21 days]
720   BALB/C T(x:11)38H (T38H)
721   BALB/C [older than 21 days]

Links (l)

id    protocol_id     strain_id
-------------------------------------------------------
1     61846           563
2     13487           564
3     79465           721
4     41699           720

Animals (a)

id    group_id        strain_id
-------------------------------------------------------
24    9666            563
25    9666            720

Normally I would provide a query that I've tried but this time I have nothing. Instead let me break down the steps I need to do:

  1. Get a list of substrings from the name of the strain for all animals under the same group_id.
    • To get the substrings, I want the strings inside ( and )
    • Something like SELECT /* get the list of substrings */ FROM animals a LEFT JOIN strains s ON s.id = a.strain_id WHERE a.group_id = 9666
    • In this example, I want the following list: (BNIP3 Wt) and (T38H)
    • In the case that there are multiple values that are enclosed in ( and ), just use the last one
  2. Use the resulting list above for the LIKE clause to get the list of protocol_id whose strain names contain the substrings.

My idea of a solution

SELECT l.protocol_id
FROM links l
LEFT JOIN strains s
ON s.id = l.strain_id
WHERE s.name LIKE (/* put the list here with the % for wildcards */)

The end result that I want to get is the following:

protocol_id
------------------------------
61846
13487
41699
dokgu
  • 4,957
  • 3
  • 39
  • 77
  • Possible duplicate of [How to load a large number of strings to match with oracle database?](http://stackoverflow.com/questions/34699223/how-to-load-a-large-number-of-strings-to-match-with-oracle-database) – MT0 Mar 22 '16 at 15:03

2 Answers2

1

SQL Fiddle

Oracle 11g R2 Schema Setup:

CREATE TABLE Strains (id, name ) AS
SELECT 562,   'B6;129 (Bnip3 KO)' FROM DUAL UNION ALL
SELECT 563,   'B6;129 (BNIP3 Wt) [pregnant]' FROM DUAL UNION ALL
SELECT 564,   'B6;129 (BNIP3 Wt) [older than 21 days]' FROM DUAL UNION ALL
SELECT 720,   'BALB/C T(x:11)38H (T38H)' FROM DUAL UNION ALL
SELECT 721,   'BALB/C [older than 21 days]' FROM DUAL;

CREATE TABLE Links (id, protocol_id, strain_id ) AS
SELECT 1,     61846,           563 FROM DUAL UNION ALL
SELECT 2,     13487,           564 FROM DUAL UNION ALL
SELECT 3,     79465,           721 FROM DUAL UNION ALL
SELECT 4,     41699,           720 FROM DUAL;

CREATE TABLE Animals (id, group_id, strain_id ) AS
SELECT 24,    9666,            563 FROM DUAL UNION ALL
SELECT 25,    9666,            720 FROM DUAL;

Query 1:

SELECT l.protocol_id
FROM   links l
       INNER JOIN strains s
       ON s.id = l.strain_id
       INNER JOIN (
         SELECT REGEXP_SUBSTR( s.name, '\(.*?\)', 1, l.COLUMN_VALUE ) AS id
         FROM   strains s,
                TABLE( 
                  CAST(
                    MULTISET(
                      SELECT LEVEL
                      FROM   DUAL
                      CONNECT BY LEVEL <= REGEXP_COUNT( s.name, '\(.*?\)' )
                    ) AS SYS.ODCINUMBERLIST
                  )
                ) l,
                animals a
         WHERE  a.strain_id = s.id
         AND    a.group_id = 9666
       ) t
       ON s.name LIKE '%' || t.id || '%'

Results:

| PROTOCOL_ID |
|-------------|
|       61846 |
|       13487 |
|       41699 |
|       41699 |
MT0
  • 143,790
  • 11
  • 59
  • 117
  • This looks great but how did you get the values you passed into `SYS.ODCIVARCHAR2LIST`? – dokgu Mar 22 '16 at 15:13
  • The substrings for the strain names must not be hard-coded. – dokgu Mar 22 '16 at 15:32
  • @PatrickGregorio Updated - add `DISTINCT` if you only want unique values. – MT0 Mar 22 '16 at 15:41
  • Based on your SQL Fiddle I can confirm that this works. Is there a reason why you're not using `SUBSTR` and `INSTR`? I think those 2 should be enough (currently reading how to use them). – dokgu Mar 22 '16 at 15:47
  • I tried your solution on my actual data and I'm getting a lot of `protocol_id` even with `DISTINCT`. Does your solution ignore rows where it doesn't find any `(` and `)`? – dokgu Mar 22 '16 at 16:03
  • [SQL Fiddle](http://sqlfiddle.com/#!4/8f7fb1/16) - I added a small validation to make sure that there is a substring found. `INSTR(s.name, '(', 1, 1) > 0`. Thanks! – dokgu Mar 22 '16 at 16:27
  • The regular expressions will check for `(` and `)` and will only return rows containing those patterns. Adding `INSTR` or `SUBSTR` is unnecessary and would just do exactly the same thing but in a much more convoluted way. – MT0 Mar 22 '16 at 18:04
0

You can maybe solve the problem using REGEXP_LIKE

e.g.

With strains (id, name) as
(
          select 562,   'B6;129 (Bnip3 KO)' from dual
union all select 563,   'B6;129 (BNIP3 Wt) [pregnant]'  from dual
union all select 564,   'B6;129 (BNIP3 Wt) [older than 21 days]'  from dual
union all select 720,   'BALB/C T(x:11)38H (T38H)'  from dual
union all select 721,   'BALB/C [older than 21 days]'  from dual
)
    ,Links (id, protocol_id, strain_id ) as
(    
          select 1,     61846,           563 from dual
union all select 2,     13487,           564 from dual
union all select 3,     79465,           721 from dual
union all select 4,     41699,           720 from dual
)
   ,Animals (id, group_id, strain_id) as
(   
          select 24,    9666,            563 from dual
union all select 25,    9666,            720 from dual
)
SELECT l.protocol_id
FROM links l
LEFT JOIN strains s ON s.id = l.strain_id
WHERE  REGEXP_LIKE (s.name, '*BNIP3|T38H*');
Ricardo Arnold
  • 903
  • 1
  • 12
  • 21