34

This query creates a mysql view that captures bad email address formats in one table. So if a row is inserted in that has rtrrg.com as a email it will be recorded in the view. My question is, how do I make the view track more than one table. A second table.

The SQL

CREATE VIEW `invalid_emails` AS 
  select `table_with_email_column`.`email` AS `invalidemail` 
    from `table_with_email_column` 
   where ((locate(_latin1'', ltrim(rtrim(`table_with_email_column`.`email`))) <> 0) 
      or (left(ltrim(`table_with_email_column`.`email`), 1) = _latin1'@') 
      or (right(rtrim(`table_with_email_column`.`email`), 1) = _latin1'.') 
      or ((locate(_latin1'.', `table_with_email_column`.`email`,locate(_latin1'@', `table_with_email_column`.`email`)) -  locate(_latin1'@', `table_with_email_column`.`email`)) <= 1) 
      or ((length(ltrim(rtrim(`table_with_email_column`.`email`))) -  length(replace(ltrim(rtrim(`table_with_email_column`.`email`)), _latin1'@', _latin1''))) <> 1) 
      or (locate(_latin1'.', reverse(ltrim(rtrim(`table_with_email_column`.`email`)))) < 3) 
      or (locate(_latin1'.@', `table_with_email_column`.`email`) <> 0) 
      or (locate(_latin1'..', `table_with_email_column`.`email`) <> 0));
hol
  • 8,255
  • 5
  • 33
  • 59

9 Answers9

96

You can use a pure SELECT to validate Email Addresses:

SELECT * FROM `users` WHERE `email` NOT REGEXP '^[^@]+@[^@]+\.[^@]{2,}$';

And now for your question of tracking multiple tables, you can use comma seperated table names right?

SELECT * FROM `users`, `customers`, `clients`
         WHERE `email` NOT REGEXP "^[a-zA-Z0-9][a-zA-Z0-9.!#$%&'*+-/=?^_`{|}~]*?[a-zA-Z0-9._-]?@[a-zA-Z0-9][a-zA-Z0-9._-]*?[a-zA-Z0-9]?\\.[a-zA-Z]{2,63}$";
Jacob Thomason
  • 3,062
  • 2
  • 17
  • 21
Praveen Kumar Purushothaman
  • 164,888
  • 24
  • 203
  • 252
  • 2
    `'^[^@]+@[^@]+\.[^@]{2,}$'` is bad because not all symbols are allowed in emails account name. Also `'^[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$'` is not correct, because emails account name should start and end with letter or number (not dot or _ or any other symbol). Same issue with domain name. – DarkSide Aug 31 '15 at 09:42
  • 1
    @PraveenKumar it's useful for someone else coming to the answer, even years later, to be aware of the limitations. I have an invalid email address ending with a period, for example, which your regex doesn't catch. This isn't to negate your effort in answering which is appreciated and put me on the right track. – Ivan McA Jan 03 '18 at 09:28
  • @IvanMcA Thanks man... Would you like to update the answer yourself? I'll approve it, this is what Stack Overflow is for and this is how it works. `:)` – Praveen Kumar Purushothaman Jan 03 '18 at 19:58
  • This answer also does not cater for any email addresses which contain an apostrophe, which is a valid address character and not uncommon for people of Irish descent. – Jon Wyatt Jan 05 '18 at 10:42
  • @JonWyatt May be that's not the case 5 years ago. – Praveen Kumar Purushothaman Jan 05 '18 at 19:03
  • @Praveen Kumar Indeed. I'm not saying this wasn't necessarily valid at the time of the original answer, but it should be made to clear to people reading it now (2018) that this is not really a valid solution anymore. – Jon Wyatt Jan 19 '18 at 09:03
  • @JonWyatt Agree with you, but that's common sense. I don't mind adding a note saying this is not valid anymore, but please, people need to think. They can't just say "Windows 3.1 is awesome with UI compared to its predecessors". Do you get it? – Praveen Kumar Purushothaman Jan 22 '18 at 11:01
  • 1
    If you want a much more compliant regex, I've validated the following against a pretty large database of email addresses: ```'^[a-zA-Z0-9][a-zA-Z0-9.!#$%&\'*+-/=?^_`{|}~]*?[a-zA-Z0-9._-]?@[a-zA-Z0-9][a-zA-Z0-9._-]*?[a-zA-Z0-9]?\\.[a-zA-Z]{2,63}$'``` See https://stackoverflow.com/a/2049510/1443533 for details on email address specs. – Jacob Thomason Feb 02 '22 at 06:35
  • Thanks @JacobThomason. Feel free to add as a suggestion and get 2 points. I'll accept your suggestion. – Praveen Kumar Purushothaman Feb 02 '22 at 07:54
17

For the proper email validation, you can use this regex as bellow:

SELECT
    *
FROM
    `school`
WHERE
    `email` NOT REGEXP '^[a-zA-Z0-9][a-zA-Z0-9._-]*[a-zA-Z0-9._-]@[a-zA-Z0-9][a-zA-Z0-9._-]*[a-zA-Z0-9]\\.[a-zA-Z]{2,63}$';
mahemoff
  • 44,526
  • 36
  • 160
  • 222
Sachin Parse
  • 1,269
  • 11
  • 12
  • 6
    This regexp is slightly better ```'^[a-zA-Z0-9][a-zA-Z0-9._-]*@[a-zA-Z0-9][a-zA-Z0-9._-]*\\.[a-zA-Z]{2,4}$'``` because it allows `x@x.com` too. – DarkSide Apr 08 '16 at 14:44
  • 1
    The max length for a tld is 63 characters. So it should end {2,63} – Sam_Benne Mar 16 '18 at 09:29
  • Should allow plus signs before the @ sign as well: '^[a-zA-Z0-9][a-zA-Z0-9._+-]*@[a-zA-Z0-9][a-zA-Z0-9._-]*\\.[a-zA-Z]{2,4}$' (e.g. joe+facebook@example.com) – Codemonkey Aug 02 '18 at 07:04
  • Above is no longer valid. We now have top level domains like .global or .ninja. – barell Sep 14 '18 at 13:48
  • It indeed doesn't handler longer but valid TLDs. It also doesn't handle ".co.uk" and similar domains. Should at least be: `'^[a-zA-Z0-9][a-zA-Z0-9._-]*@[a-zA-Z0-9][a-zA-Z0-9._-]*(\\.[a-zA-Z]{2,4})+$'`. As for a length check, we're quite defenseless with a single regular expression. – Kafoso Dec 08 '20 at 13:08
8

Simple SELECT statement is sufficient, for example:

 SELECT * FROM user WHERE email NOT 
 REGEXP '^[a-zA-Z0-9][+a-zA-Z0-9._-]*@[a-zA-Z0-9][a-zA-Z0-9._-]*[a-zA-Z0-9]*\\.[a-zA-Z]{2,4}$'

This query handles the Gmail addresses with + sign and addresses where the host is a single letter.

Bud Damyanov
  • 30,171
  • 6
  • 44
  • 52
  • It doesn't allow domains with TLDs like .world, .group or .global, which are valid. – Luis Rodriguez Sep 23 '21 at 14:54
  • Set `{2,4}` to `{2,6}` in the query – Bud Damyanov Sep 24 '21 at 18:43
  • That `[a-zA-Z0-9]*` right before the double backslashes makes no sense to me. That list of (0 or more) characters is already matched by the (0 or more) characters before: `[a-zA-Z0-9._-]*`. You probably want to exclude `._-` right before the dot. But then the `*` must be changed to a `+`: `[a-zA-Z0-9]+`. – Michael Härtl Oct 29 '21 at 06:40
1

You can use a UNION in the VIEW but then you have to repeat all the WHERE statement which gives you redundant code. So you would make a helper VIEW that makes you a UNION and then apply the WHERE clause.

Demo here: SQL Fiddle Demo.

That would apply to your SQL somehow like this (untested);

CREATE VIEW `invalid_emails_helper` AS
  select `table_with_email_column`.`email` AS `invalidemail` 
    from `table_with_email_column` 
union
  select `table_with_email_column`.`email` 
    from `second_table_with_email_column` 

CREATE VIEW `invalid_emails` AS 
  select `invalidemail` as `email`
    from `invalid_emails_helper` as `table_with_email_column` 
   where ((locate(_latin1'', ltrim(rtrim(`table_with_email_column`.`email`))) <> 0) 
      or (left(ltrim(`table_with_email_column`.`email`), 1) = _latin1'@') 
      or (right(rtrim(`table_with_email_column`.`email`), 1) = _latin1'.') 
      or ((locate(_latin1'.', `table_with_email_column`.`email`,locate(_latin1'@', `table_with_email_column`.`email`)) -  locate(_latin1'@', `table_with_email_column`.`email`)) <= 1) 
      or ((length(ltrim(rtrim(`table_with_email_column`.`email`))) -  length(replace(ltrim(rtrim(`table_with_email_column`.`email`)), _latin1'@', _latin1''))) <> 1) 
      or (locate(_latin1'.', reverse(ltrim(rtrim(`table_with_email_column`.`email`)))) < 3) 
      or (locate(_latin1'.@', `table_with_email_column`.`email`) <> 0) 
      or (locate(_latin1'..', `table_with_email_column`.`email`) <> 0));

And yes, the query to check the e-mail address using a regex as can easily found everywhere in the internet simplifies it further.

Praveen Kumar Purushothaman
  • 164,888
  • 24
  • 203
  • 252
hol
  • 8,255
  • 5
  • 33
  • 59
0
SELECT
    *
FROM
    users
WHERE
    email NOT REGEXP ‘ ^[ a - zA - Z0 - 9 ][ a - zA - Z0 - 9._ -]*[ a - zA - Z0 - 9 ]@[ a - zA - Z0 - 9 ][ a - zA - Z0 - 9._ -]*[ a - zA - Z0 - 9 ]\.[ a - zA - Z ]{ 2,
    63 } $’
mahemoff
  • 44,526
  • 36
  • 160
  • 222
  • This is slightly better answer than accepted one, but still not perfext. `a@example.com` doesn't validate because there is only 1 symbol before @. Same with domain names. – DarkSide Aug 31 '15 at 09:40
0
select EmailAddress from FindInvalidEmailAddressDemo
   -> where EmailAddress NOT LIKE '%_@_%._%';
Ramil Aliyev 007
  • 4,437
  • 2
  • 31
  • 47
0

With MySQL 9 you can create a check constraint now that will validate all data stored in the email address column. Here is an ALTER TABLE statement:

ALTER TABLE `user` 
ADD CONSTRAINT `user.email_validation` 
    CHECK (`email` REGEXP "^[a-zA-Z0-9][a-zA-Z0-9.!#$%&'*+-/=?^_`{|}~]*?[a-zA-Z0-9._-]?@[a-zA-Z0-9][a-zA-Z0-9._-]*?[a-zA-Z0-9]?\\.[a-zA-Z]{2,63}$");
Jacob Thomason
  • 3,062
  • 2
  • 17
  • 21
0

My solution for finding invalid emails:

SELECT * FROM `tbl_email` WHERE `email` NOT REGEXP '^[a-zA-Z0-9]+[a-zA-Z0-9._-]*@[a-zA-Z0-9]+[a-zA-Z0-9._-]*\.[a-zA-Z0-9]{2,63}$';
JEX725
  • 61
  • 5
0

Once only regexp suggestted doesn't consider all errors, I complete on my own function. See the comments into code.

DELIMITER $$

CREATE OR REPLACE function fn_DOC_Validar_EMail ( pStrEMail VARCHAR(200) )
RETURNS BIT
BEGIN

declare lIntValid bit;

set lIntValid = 0;

SELECT 1 into lIntValid
   Where pStrEMail REGEXP '^[a-zA-Z0-9][a-zA-Z0-9.!#$%&\'*+-/=?^_`{|}~]*?[a-zA-Z0-9._-]?@[a-zA-Z0-9][a-zA-Z0-9._-]*?[a-zA-Z0-9]?\\.[a-zA-Z]{2,63}$'
   and pStrEMail not like '%[^a-z0-9@._-]%' -- not allow characters differents: a-z 0-9 @ . _ -
   and pStrEMail not like '%@%@%' -- not allow  two @
   and pStrEMail not like '%.@%' -- not allow  .@
   and pStrEMail not like '%..%' -- not allow  ..
   and pStrEMail not like '%.' -- not allow  . (dot) at end
   and pStrEMail like '%_@_%_.__%' -- not allow short, i.e.,  a@a.com
   and pStrEMail not LIKE '%^%' -- not allow character ^
   and pStrEMail not LIKE '%\%%' -- not allow character %
;

return lIntValid ;
  
END;

$$

Use the samples bellow to Test your function.

Select Sequencial, email, fn_DOC_Validar_EMail(EMail) from TEMP_Emails;

Create table TEMP_Emails
(
  Sequencial int,
  Email varchar(200)
);

-- invalids
insert into TEMP_Emails values (1, '@teste.com'); -- Start with @
insert into TEMP_Emails values (2, 'josue@teste'); -- with out domain
insert into TEMP_Emails values (3, 'jo ue@teste'); -- espace
insert into TEMP_Emails values (4, 'jo"ue@teste'); -- quotes
insert into TEMP_Emails values (5, 'jo$ue@teste'); -- special Character
insert into TEMP_Emails values (6, 'josue^teste@teste.com'); -- special Character
insert into TEMP_Emails values (7, 'josue]teste@teste.com'); -- special Character
insert into TEMP_Emails values (8, 'josue%teste@teste.com'); -- special Character
insert into TEMP_Emails values (9, 'josue@.teste.com'); -- @. 
insert into TEMP_Emails values (10, 'josue@@teste.com'); -- 2 x @
insert into TEMP_Emails values (11, 'josue@teste@teste.com'); -- 2 x @
insert into TEMP_Emails values (12, 'josue.@teste.com'); -- .@ 
insert into TEMP_Emails values (13, 'josue@teste..com'); -- .. 
insert into TEMP_Emails values (14, 'josue@teste.ad.'); -- . at final
 
 
--OK: Valids
insert into TEMP_Emails values (101, 'josue@teste.com.br');
insert into TEMP_Emails values (102, 'jo.sue@teste.com.br');
insert into TEMP_Emails values (103, 'josue@teste.com');
insert into TEMP_Emails values (104, 'josue.teste@teste.com');
insert into TEMP_Emails values (105, 'josue_teste@teste.com');
insert into TEMP_Emails values (106, 'josue-teste@teste.com');
insert into TEMP_Emails values (107, 'josue@dba-pro.com');
insert into TEMP_Emails values (108, 'josue@dba.pro.com');
insert into TEMP_Emails values (109, 'josue2017@teste.com');
insert into TEMP_Emails values (110, '2022@2022.com');

Select Sequencial, email, fn_DOC_Validar_EMail(EMail) from TEMP_Emails;