0

I'm trying to build a regexp for email validation on both parts; local-part and Domain-part, respectively:

Local-Part: ^[A-Z0-9][A-Z0-9._%+-]{0,63} - The maximum total length of the local-part of an email address is 64 octets.

Domain-Part: (([A-Z0-9]{1,63})[A-Z0-9]+(-[A-Z0-9]+)*\.){1,8}[A-Z]{2,63}$

Regexp : ^[A-Z0-9][A-Z0-9._%+-]{0,63}@(([A-Z0-9]{1,63})[A-Z0-9]+(-[A-Z0-9]+)*\.){1,8}[A-Z]{2,63}$

I'm satisfied with Domain Part. But I need to ensure some rules on Local-part that I'm not being able to achieve, regarding Dots '.'.

Dot's Rule : provided that it is not the first or last character unless quoted, and provided also that it does not appear consecutively unless quoted (e.g. John..Doe@example.com is not allowed but "John..Doe"@example.com is allowed).

On my regexp I already guarantee that Dot is not first char. I need to check for consecutive Dots (e.g. '..' not allowed but ".." is allowed, and that '.@' must not happen too).

Any help please?

  • `([A-Z0-9]{1,63})[A-Z0-9]+` matches an infinite length string – Toto Jan 23 '20 at 13:02
  • Please, have a look at these sites: [TLD list](https://www.iana.org/domains/root/db); [valid/invalid addresses](https://en.wikipedia.org/wiki/Email_address#Examples); [regex for RFC822 email address](http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html) – Toto Jan 23 '20 at 13:03
  • Thank you Toto. I'll check your sugestion –  Jan 23 '20 at 14:28

2 Answers2

0

Add a second check that the email does not match any number of unquoted, non-at characters and then two consecutive dots.

SELECT *
FROM   your_table
WHERE  REGEXP_LIKE(
         email,
         -- your regular expression
         '^[A-Z0-9][A-Z0-9._%+-]{0,63}@(([A-Z0-9]{1,63})[A-Z0-9]+(-[A-Z0-9]+)*\.){1,8}[A-Z]{2,63}$'
       )
AND    NOT REGEXP_LIKE( email, '^[^"@]+\.\.' )

However, I feel that you would be better off not using your regular expression (since it does not accept quotes or extended character sets and if you can get it working in Oracle use this one) or, even better, just accepting whatever has been entered and sending a confirmation e-mail to the user as this not-only checks that the e-mail is valid syntactically but also checks that the e-mail exists and that the user wants whatever service you are providing.


Update:

To use the regular expression in the answer linked above:

CREATE TABLE test_data ( id, email ) AS
SELECT  1, 'abc@example.com' FROM DUAL UNION ALL
SELECT  2, 'abc.def@example.com' FROM DUAL UNION ALL
SELECT  3, 'abc..def@example.com' FROM DUAL UNION ALL
SELECT  4, 'abc.def.@example.com' FROM DUAL UNION ALL
SELECT  5, '".abc.."@example.com' FROM DUAL UNION ALL
SELECT  6, 'abc.def++yourdomain.com@example.com' FROM DUAL UNION ALL
SELECT  7, '"with\"quotes\""@example.com' FROM DUAL UNION ALL
SELECT  8, '""@example.com' FROM DUAL UNION ALL
SELECT  9, '"\' || CHR(9) || '"@example.com' FROM DUAL UNION ALL
SELECT 10, '"""@example.com' FROM DUAL UNION ALL
SELECT 11, '123456789.123456789.123456789.123456789.123456789.123456789.1234567890@example.com' FROM DUAL UNION ALL
SELECT 12, 'ABC@example.com' FROM DUAL;

Query:

SELECT *
FROM   test_data
WHERE  REGEXP_LIKE(
         email,
         '^('
           -- Unquoted local-part
           || '[a-z0-9!#$%&''*+/=?^_{|}~-]+(\.[a-z0-9!#$%&''*+/=?^_{|}~-]+)*'
           --                               ^^
           --                               Allow a dot but always expect a
           --                               non-dot after it.
           -- Quoted local-part
           || '|"('
             -- Unescaped characters in the quotes
             || '[]' || CHR(1) || '-' || CHR(8) || CHR(11) || CHR(12) || CHR(14) || '-!#-[^-'||CHR(127)||']'
             -- Escaped characters in the quotes
             || '|\\[' || CHR(1) || '-' || CHR(9) || CHR(11) || CHR(12) || CHR(14) || '-' || CHR(127) || ']'
             || ')*"'
           || ')'
           -- Match at symbol at end of local-part
           || '@',
           -- Case insensitive
           'i'
       )

Output:

ID | EMAIL                                                                             
-: | :---------------------------------------------------------------------------------
 1 | abc@example.com                                                                   
 2 | abc.def@example.com                                                               
 5 | ".abc.."@example.com                                                              
 6 | abc.def++yourdomain.com@example.com                                               
 7 | "with\"quotes\""@example.com                                                      
 8 | ""@example.com                                                                    
 9 | "\        "@example.com                                                           
11 | 123456789.123456789.123456789.123456789.123456789.123456789.1234567890@example.com
12 | ABC@example.com                                                                   

db<>fiddle here

MT0
  • 143,790
  • 11
  • 59
  • 117
  • Many thanks MTO. Forgeting the Quotes; what changes would I have to make, to detect consecutive dots and '. @'? –  Jan 23 '20 at 14:34
  • I don't think you can make it do consecutive dots and have a 64 character limit on the local part (since Oracle does not support look-ahead/-behind); if you want both then you need two separate checks. – MT0 Jan 23 '20 at 18:08
  • Thank you MTO. I agree with you. I'll use separate checks. Can you help me please on including in the regexp local-part '^[A-Z0-9][A-Z0-9._%+-]{0,63}@' the ability to check for consecutive Dot's and Dot @; both '..' or '.@' they are not allowed. The actual regex only prevents the local-part from starting with Dot as first character. –  Jan 24 '20 at 09:57
  • @LEOPOLDO As linked in the question, [this answer](https://stackoverflow.com/a/201378/1509264) has a regular expression and you can extract the local part from it: `^([a-z0-9!#$%&'*+/=?^_`{|}~-]+(\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"([\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@` – MT0 Jan 24 '20 at 10:37
  • Many thanks. I'm kind of rookie in regexes, and the one above seems to be very complex. Can you tell me witch part of the above regex is checking for consecutive dots and Dot as last character before the '@'? I thank you in advance. –  Jan 24 '20 at 11:23
  • @LEOPOLDO The regular expression is not checking for the absence of two consecutive dots; its enforcing that if there is a dot then there must be a non-dot character after it in the non-quoted syntax for the local part. Its effectively the same thing just approaching it from a different angle. – MT0 Jan 24 '20 at 11:35
-1

Filter using below sql-

like '%_@__%.__%'
KetanC
  • 1
  • 1