0

I have a regular expression like the following: (Running on Oracle's regexp_like(), despite the question isn't Oracle-specific)

abc|bcd|def|xyz

This basically matches a tags field on database to see if tags field contains abc OR bcd OR def OR xyz when user has input for the search query "abc bcd def xyz". The tags field on the database holds keywords separated by spaces, e.g. "cdefg abcd xyz"

On Oracle, this would be something like:

select ... from ... where 
   regexp_like(tags, 'abc|bcd|def|xyz');

It works fine as it is, but I want to add an extra option for users to search for results that match all keywords. How should I change the regular expression so that it matches abc AND bcd AND def AND xyz ?

Note: Because I won't know what exact keywords the user will enter, I can't pre-structure the query in the PL/SQL like this:

select ... from ... where 
   tags like '%abc%' AND
   tags like '%bcd%' AND
   tags like '%def%' AND
   tags like '%xyz%';
Gary_W
  • 9,933
  • 1
  • 22
  • 40
user2526586
  • 972
  • 2
  • 12
  • 27
  • I don't think you can do this with Oracle regexp. In PCRE you can do it with lookaheads (see https://stackoverflow.com/questions/469913/regular-expressions-is-there-an-and-operator/470602#470602) but Oracle doesn't support that. – Barmar Nov 07 '22 at 18:16
  • I would like to suggest that in your OR example you tighten up the match criteria so that the string is preceded by a space or the start of the line, and proceded by a space or the end of the line, thus eliminating conditions where the string may be part of a larger word. Especially if you are matching user input, who knows what they will enter, bless their hearts. `REGEXP_LIKE(tags, '(^| )(abc|bcd|def|xyz)( |$)');` – Gary_W Nov 07 '22 at 20:05

3 Answers3

1

You can split the input pattern and check that all the parts of the pattern match:

SELECT t.*
FROM   table_name t
       CROSS APPLY(
         WITH input (match) AS (
           SELECT 'abc bcd def xyz' FROM DUAL
         )
         SELECT 1
         FROM   input
         CONNECT BY LEVEL <= REGEXP_COUNT(match, '\S+')
         HAVING COUNT(
                  REGEXP_SUBSTR(
                    t.tags,
                    REGEXP_SUBSTR(match, '\S+', 1, LEVEL)
                  )
                ) = REGEXP_COUNT(match, '\S+')
       )

Or, if you have Java enabled in the database then you can create a Java function to match regular expressions:

CREATE AND COMPILE JAVA SOURCE NAMED RegexParser AS
import java.util.regex.Pattern;

public class RegexpMatch {
  public static int match(
    final String value,
    final String regex
  ){
    final Pattern pattern = Pattern.compile(regex);

    return pattern.matcher(value).matches() ? 1 : 0;
  }
}
/

Then wrap it in an SQL function:

CREATE FUNCTION regexp_java_match(value IN VARCHAR2, regex IN VARCHAR2) RETURN NUMBER
AS LANGUAGE JAVA NAME 'RegexpMatch.match( java.lang.String, java.lang.String ) return int';
/

Then use it in SQL:

SELECT *
FROM   table_name
WHERE  regexp_java_match(tags, '(?=.*abc)(?=.*bcd)(?=.*def)(?=.*xyz)') = 1;
MT0
  • 143,790
  • 11
  • 59
  • 117
0

Try this, the idea being counting that the number of matches is == to the number of patterns:

with data(val) AS (
    select 'cdefg abcd xyz' from dual union all
    select 'cba lmnop xyz' from dual
),
targets(s) as (
    select regexp_substr('abc bcd def xyz', '[^ ]+', 1, LEVEL)  from dual
    connect by regexp_substr('abc bcd def xyz', '[^ ]+', 1, LEVEL) is not null
)
select val from data d
join targets t on 
    regexp_like(val,s)
group by val having(count(*) = (select count(*) from targets))
;

Result:

cdefg abcd xyz
p3consulting
  • 2,721
  • 2
  • 12
  • 10
0

I think dynamic SQL will be needed for this. The match all option will require individual matching with logic to ensure every individual match is found.

An easy way would be to build a join condition for each keyword. Concatenate the join statements in a string. Use dynamic SQL to execute the string as a query.

The example below uses the customer table from the sample schemas provided by Oracle.

DECLARE

  -- match string should be just the values to match with spaces in between
  p_match_string           VARCHAR2(200) := 'abc bcd def xyz';
  -- need logic to determine match one (OR) versus match all (AND)
  p_match_type             VARCHAR2(3) := 'OR';
  l_sql_statement          VARCHAR2(4000);
  -- create type if bulk collect is needed
  TYPE t_email_address_tab IS TABLE OF customers.EMAIL_ADDRESS%TYPE INDEX BY PLS_INTEGER;
  l_email_address_tab      t_email_address_tab;

BEGIN

  WITH sql_clauses(row_idx,sql_text) AS
  (SELECT 0 row_idx -- build select plus beginning of where clause
         ,'SELECT email_address '
         || 'FROM customers '
         || 'WHERE 1 = '
         || DECODE(p_match_type, 'AND', '1', '0') sql_text
   FROM DUAL
   UNION
   SELECT LEVEL row_idx -- build joins for each keyword
         ,DECODE(p_match_type, 'AND', ' AND ', ' OR ') 
          || 'email_address'
          || ' LIKE ''%' 
          || REGEXP_SUBSTR( p_match_string,'[^ ]+',1,level) 
          || '%''' sql_text
   FROM   DUAL
   CONNECT BY LEVEL <= LENGTH(p_match_string) - LENGTH(REPLACE( p_match_string, ' ' )) + 1
  )
  -- put it all together by row_idx
  SELECT LISTAGG(sql_text, '') WITHIN GROUP (ORDER BY row_idx)
  INTO l_sql_statement
  FROM sql_clauses;

  dbms_output.put_line(l_sql_statement);

  -- can use execute immediate (or ref cursor) for dynamic sql
  EXECUTE IMMEDIATE l_sql_statement
  BULK COLLECT
  INTO   l_email_address_tab;

END;
Variable Value
p_match_string abc bcd def xyz
p_match_type AND
l_sql_statement SELECT email_address FROM customers WHERE 1 = 1 AND email_address LIKE '%abc%' AND email_address LIKE '%bcd%' AND email_address LIKE '%def%' AND email_address LIKE '%xyz%'
Variable Value
p_match_string abc bcd def xyz
p_match_type OR
l_sql_statement SELECT email_address FROM customers WHERE 1 = 0 OR email_address LIKE '%abc%' OR email_address LIKE '%bcd%' OR email_address LIKE '%def%' OR email_address LIKE '%xyz%'