2

I'm using the function called: "REGEXP_LIKE", with next below pattern:

^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$

But, I have a column that contain next values to analyze:

REGEXP_LIKE (column_name,'^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$')

FRANÞOISVERBEKE@TISCALINET.BE
GENEVIÞVE.DELSOIR@MINFIN.FED.BE
CREVECOEURÆ-OLI@HOTMAIL.COM
HERVÉ.GHILBERT@SKYNET.BE

As you note, all of them contain special character and all of them are considered correct when I use the function with this pattern.

Do you know why, if I'm not specifying the special characters? How can I exclude all special characters with this function and this pattern?

user1037527
  • 49
  • 1
  • 2
  • 5
  • 1
    Others have examined this here (perhaps slightly different question). I would recommend that you look over the answers to this question, http://stackoverflow.com/questions/156430/regexp-recognition-of-email-address-hard. – Patrick Bacon Dec 31 '14 at 13:54
  • @Patrick Great link. But I assume the issue here is with non-ASCII characters matching what seems to be an ASCII-only regexp. – Sylvain Leroux Dec 31 '14 at 14:23
  • Probably not related to the question per se, but it seems to me you have some [mojibake](http://simple.wikipedia.org/wiki/Mojibake) here: `FRANÞOISVERBEKE` should probably be `FRANÇOISVERBEKE`. Same thing for `GENEVIÈVE`... BTW, [rfc6530](https://tools.ietf.org/html/rfc6530) add support for non-ASCII characters in e-mail address. – Sylvain Leroux Dec 31 '14 at 14:24
  • Hi, what do you mean to add support for non-ASCII characters? For now, the values in the table are like this. I cannot modify them, only I need to filter them. – user1037527 Dec 31 '14 at 14:42
  • They are not being filtered out because e.g., the ligature `Æ` is within the range `A-Z` - at least in whatever character set you're using in your database. – David Faber Dec 31 '14 at 16:45
  • I agree with @SylvainLeroux that it is odd that the thorn character `Þ` and the ligature `Æ` are showing here. But it is possible that whatever transformation occurred happened before the data were inserted into the database. – David Faber Jan 02 '15 at 14:19

1 Answers1

3

I am not entirely sure you can accomplish this within your regular expression. However, you could add an additional filter as below:

SELECT * FROM table_name
 WHERE REGEXP_LIKE(column_name,'^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$')
   AND REPLACE(TRANSLATE(LOWER(column_name), 'abcdefghijklmnopqrstuvwxyz0123456789@+.-_%','zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz'),'z') IS NOT NULL

The TRANSLATE() function will replace all of the "regular" letters (plus the characters ordinarily allowed in email addresses; I think I've gotten them all) with 'z's; the REPLACE() function replaces these with nothing; if the resulting string IS NOT NULL then there are "special" characters.

I could not confirm that this actually works since the character set in my database is ASCII and doesn't return "special" characters for the regex. But I confirmed that the REPLACE(TRANSLATE()) clause does work:

WITH t1 AS (
    SELECT 'FRANÞOISVERBEKE@TISCALINET.BE' AS mycolumn FROM dual
)
SELECT mycolumn
     , REPLACE(TRANSLATE(LOWER(mycolumn),'abcdefghijklmnopqrstuvwxyz0123456789@+.-_%','zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz'),'z') AS mynewcolumn
  FROM t1
 WHERE REPLACE(TRANSLATE(LOWER(mycolumn),'abcdefghijklmnopqrstuvwxyz0123456789@+.-_%','zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz'),'z') IS NOT NULL

Result:

MYCOLUMN                       MYNEWCOLUMN
FRANÞOISVERBEKE@TISCALINET.BE  þ
David Faber
  • 12,277
  • 2
  • 29
  • 40