2

I am using RLIKE to find some email domains with mysql.

Here is the Query:

SELECT something 
FROM table1 
WHERE SUBSTRING_INDEX(table1.email, "@", -1) RLIKE "test1.com"|"test2.com"

This matched all the email domains with numbers in, example:

aaa@domain0.com

Any idea why?

EDIT: I also noticed that it finds email domains that has at least two successive digits.

Really strange.

Milos Cuculovic
  • 19,631
  • 51
  • 159
  • 265
  • Can you post a sample set of rows to match? I think what has happened is the `|` is interpreted as [a bitwise OR](https://dev.mysql.com/doc/refman/5.0/en/non-typed-operators.html) because the outer regexp is not enclosed in single quotes. That probably caused the entire regex to be cast to the string `"0"`. – Michael Berkowski Jul 01 '15 at 13:25

1 Answers1

4

The string supplied to RLIKE or REGEXP needs to be a quoted string, wherein the entire regular expression is single-quoted. What you have are two double-quoted strings separated by |, which is the bitwise OR operator.

That is causing the whole expression to be evaluated as 0, and that's why the domain aaa@domain0.com is matched:

# The unquoted | evaluates this to zero:
mysql> SELECT "string" | "string";
+---------------------+
| "string" | "string" |
+---------------------+
|                   0 |
+---------------------+

# And zero matches domain0.com
mysql> SELECT 'domain0.com' RLIKE '0';
+-------------------------+
| 'domain0.com' RLIKE '0' |
+-------------------------+
|                       1 |
+-------------------------+

Instead, you would need to use RLIKE with a single-quoted string, and backslash-escape the .. I'm also adding ^$ anchors so substrings are not matched.

WHERE SUBSTRING_INDEX(table1.email, "@", -1) RLIKE '^test1\.com$|^test2\.com$'

It could also be expressed as '^(test1\.com|test2\.com)$'. The trick is that | has very low precedence so you need to ensure both ends are anchored for every possible string you want to match.

However, if you are just trying to match a list of domains, it is far easier to do it with IN () so you may merely list them:

WHERE SUBSTRING_INDEX(table1.email, "@", -1) IN ('test1.com', 'test2.com', 'test4.org')
Michael Berkowski
  • 267,341
  • 46
  • 444
  • 390