1

I have 2 tables name and match. The name and match table have columns type. The columns and data in the name table

ID| type  |
--| ----  |
 1| 1ABC  |
 2| 2DEF  |
 3| 3DEF  |
 4| 4IJK  |

The columns and data in match table is

type DATA
NOT %ABC% AND NOT %DEF% NOT ABC AND NOT DEF
%DEF% DEF ONLY
NOT %DEF% AND NOT %IJK% NOT DEF AND NOT IJK

I have tried using case statement. The first 3 characters will be NOT if there is a NOT in the type in match table. The below query is giving me a missing keyword error. I am not sure what I am missing here

SELECT s.id, s.type, m.data
where case when substr(m.type1,3)='NOT' then s.type not in (REPLACE(REPLACE(m.type,'NOT',''),'AND',','))
          ELSE s.type in (m.type) end
from source s, match m;

I need the output to match the type in source column and display the data in match column.

The output should be

ID|type|DATA
1 |1ABC|NOT DEF AND NOT IJK
2 |2DEF|DEF ONLY
3 |3DEF|DEF ONLY
4 |4IJK|NOT ABC AND NOT DEF
  • 1
    Please describe the requirement to get the result you need and rules relate these tables and transform the data. It's not very clear how did you remove some part of a text in the output – astentx Nov 15 '22 at 21:01
  • Sorry it was a typo in the output.. The DATA Column from match table is the output. The output columns will be ID,type, DATA. The type column in source table should match with the condition in type column in match table and the DATA is the output from match table. – praveen muppala Nov 15 '22 at 21:23
  • The missing keyword error is probably because you put your `WHERE` before your `FROM`, but your `case` statement also can't have those `in`s in it. – EdmCoff Nov 15 '22 at 21:28
  • 1
    How general does your solution actually need to be? Does it need to potentially handle `OR`s instead of `AND`s? Does it need to handle more than 1 or 2 conditions (e.g. 3 or 4)? Can you restructure that `match` table so it doesn't require parsing? – EdmCoff Nov 15 '22 at 21:32
  • This is the sample data but the actual data comes from the client and we cannot restructure that match table. It has only 2 types. Not %% and %%. So if the type in source table matches then the type condition in the match table then it should display the data.. I am sure my case statement is wrong there.. Because we cannot use the comparison but I am out of options and just need to put something out there in case my tables are not clear – praveen muppala Nov 15 '22 at 22:43
  • Just because the data comes from a third-party doesn't mean you need to store it that way, but I'll assume you have to work with that terrible structure. I still don't think you answered my question about whether there can be more than 2 conditions per row and whether they are always `AND`ed (not `OR`ed) together. – EdmCoff Nov 15 '22 at 22:57
  • The max conditions is 2 (Example - NOT %DEF% AND NOT %IJK%). There is no OR only AND. – praveen muppala Nov 15 '22 at 22:59

3 Answers3

2

The biggest problem with your attempted query seems to be that SQL requires the WHERE clause to come after the FROM clause.

But your query is flawed in other ways as well. Although it can have complicated logic within it, including subqueries, a CASE statement must ultimately return a constant. Conditions within it are not applied as if they are in a WHERE clause of the main query (like you appear to be trying to do).

My recommendation would be to not store the match table as you currently are. It seems much preferable to have something that contains each condition you want to evaluate. Assuming that's not possible, I suggest a CTE (or even a view) that breaks it down that way first.

This query (based on Nefreo's answer for breaking strings into multiple rows)...

 SELECT 
  data,
  regexp_count(m.type, ' AND ')  + 1 num,
  CASE WHEN REGEXP_SUBSTR(m.type,'(.*?)( AND |$)',1,levels.column_value) like 'NOT %' THEN 1 ELSE 0 END negate,
  replace(replace(REGEXP_SUBSTR(m.type,'(.*?)( AND |$)',1,levels.column_value), 'NOT '), ' AND ') match
 FROM match m INNER JOIN
  table(cast(multiset(select level from dual connect by level <= regexp_count(m.type, ' AND ')  + 1) as sys.OdciNumberList)) levels
   ON 1=1

... breaks your match table into something more like:

DATA NUM NEGATE MATCH
NOT ABC AND NOT DEF 2 1 %ABC%
NOT ABC AND NOT DEF 2 1 %DEF%
DEF ONLY 1 0 %DEF%
NOT DEF AND NOT IJK 2 1 %DEF%
NOT DEF AND NOT IJK 2 1 %IJK%

So we now know each specific like condition, whether it should be negated, and the number of conditions that need to be matched for each MATCH row. (For simplicity, I am using match.data as essentially a key for this since it is unique for each row in match and is what we want to return anyway, but if you were actually storing the data this way you'd probably use a sequence of some sort and not repeat the human-readable text.)

That way, your final query can be quite simple:

SELECT name.id, name.type, criteria.data
FROM name INNER JOIN criteria
  ON
   (criteria.negate = 0 AND name.type LIKE criteria.match)
   OR
   (criteria.negate = 1 AND name.type NOT LIKE criteria.match)
GROUP BY name.id, name.type, criteria.data
HAVING COUNT(*) = MAX(criteria.num)
ORDER BY name.id

The conditions in the ON do the appropriate LIKE or NOT LIKE (matches one condition from the CRITERIA view/CTE), and the condition in the HAVING makes sure we had the correct number of total matches to return the row (makes sure we matched all the conditions in one row of the MATCH table).

You can see the entire thing...

WITH criteria AS
(
 SELECT 
  data,
  regexp_count(m.type, ' AND ')  + 1 num,
  CASE WHEN REGEXP_SUBSTR(m.type,'(.*?)( AND |$)',1,levels.column_value) like 'NOT %' THEN 1 ELSE 0 END negate,
  replace(replace(REGEXP_SUBSTR(m.type,'(.*?)( AND |$)',1,levels.column_value), 'NOT '), ' AND ') match
 FROM match m INNER JOIN
  table(cast(multiset(select level from dual connect by level <= regexp_count(m.type, ' AND ')  + 1) as sys.OdciNumberList)) levels
   ON 1=1
)
SELECT name.id, name.type, criteria.data
FROM name INNER JOIN criteria
  ON
   (criteria.negate = 0 AND name.type LIKE criteria.match)
   OR
   (criteria.negate = 1 AND name.type NOT LIKE criteria.match)
GROUP BY name.id, name.type, criteria.data
HAVING COUNT(*) = MAX(criteria.num)
ORDER BY name.id

... working in this fiddle.

As a one-off, I don't think this is significantly different than the other answer already provided, but I wanted to do this since I think this is probably more maintainable if the complexity of your conditions changes.

It already handles arbitrary numbers of conditions, mixes of NOT and not-NOT within the same row of MATCH, and allows for the % signs (for the like) to be placed arbitrarily (e.g. startswith%, %endswith, %contains%, start%somewhere%end, exactmatch should all work as expected). If in the future you want to add different types of conditions or handle ORs, I think the general ideas here will apply.

EdmCoff
  • 3,506
  • 1
  • 9
  • 9
0

Not knowing the possible other rules for selecting rows, just with your data from the question, maybe you could use this:

WITH 
    tbl_name AS
      (
          Select 1 "ID", '1ABC' "A_TYPE" From Dual Union All
          Select 2 "ID", '2DEF' "A_TYPE" From Dual Union All
          Select 3 "ID", '3DEF' "A_TYPE" From Dual Union All            
          Select 4 "ID", '4IJK' "A_TYPE" From Dual 
      ),
    tbl_match AS
      (
          Select 'NOT %ABC% AND NOT %DEF%' "A_TYPE", 'NOT ABC AND NOT DEF' "DATA" From Dual Union All
          Select '%DEF%'                   "A_TYPE", 'DEF ONLY'            "DATA" From Dual Union All
          Select 'NOT %DEF% AND NOT %IJK%' "A_TYPE", 'NOT DEF AND NOT IJK' "DATA" From Dual 
      )
Select 
    n.ID "ID", 
    n.A_TYPE, 
    m.DATA
From 
    tbl_match m 
Inner Join
    tbl_name n ON (1=1)
Where 
    (
        INSTR(m.A_TYPE, 'NOT %' || SubStr(n.A_TYPE, 2) || '%', 1, 1) = 0
      AND
        INSTR(m.A_TYPE, 'NOT %' || SubStr(n.A_TYPE, 2) || '%', 1, 2) = 0 
      AND
        Length(m.A_TYPE) > Length(SubStr(n.A_TYPE, 2)) + 2
    )
  OR
    (
        Length(m.A_TYPE) = Length(SubStr(n.A_TYPE, 2)) + 2
      AND
        '%' || SubStr(n.A_TYPE, 2) || '%' = m.A_TYPE
    )
Order By n.ID

Result:

ID A_TYPE DATA
1 1ABC NOT DEF AND NOT IJK
2 2DEF DEF ONLY
3 3DEF DEF ONLY
4 4IJK NOT ABC AND NOT DEF

Any other format of condition should be evaluated separately ...
Regards...

d r
  • 3,848
  • 2
  • 4
  • 15
0
WITH match_cte AS (
    SELECT m.data
          ,m.type
          ,decode(instr(m.type,'NOT')
                 ,1 -- found at position 1
                 ,0
                 ,1) should_find_str_1
          ,substr(m.type
                 ,instr(m.type,'%',1,1) + 1
                 ,instr(m.type,'%',1,2) - instr(m.type,'%',1,1) - 1) str_1
          ,decode(instr(m.type,'NOT',instr(m.type,'%',1,2))
                 ,0 -- no second NOT
                 ,1
                 ,0) should_find_str_2
          ,substr(m.type
                  ,instr(m.type,'%',1,3) + 1
                  ,instr(m.type,'%',1,4) - instr(m.type,'%',1,3) - 1) str_2
      FROM match m
)
SELECT s.id
      ,s.type
      ,m.data
  FROM source s
      CROSS JOIN match_cte m
 WHERE m.should_find_str_1 = sign(instr(s.type,m.str_1))
 AND  (m.str_2 IS NULL
         OR m.should_find_str_2 = sign(instr(s.type, m.str_2))
      )
ORDER BY s.id, m.data

MATCH_CTE |DATA|TYPE|SHOULD_FIND_STR_1|STR_1|SHOULD_FIND_STR_2|STR_2| |-|-|-|-|-|-| |NOT ABC AND NOT DEF|NOT %ABC% AND NOT %DEF%|0|ABC|0|DEF| |DEF|%DEF%|1|DEF|1|NULL| |NOT DEF AND NOT IJK|NOT %DEF% AND NOT %IJK%|0|DEF|0|IJK|