0

So I've been given the task of going through our mailing list and "cleaning" the emails which are slightly incorrect (trailing symbols, random symbols accidentally in the middle of the email, syntactically invalid, etc.) and to delete the entries which are totally bogus. The problem is I'm finding some of the emails in the database have already been syntactically corrected but the original entries still exist. There are over 2000 emails which my query is returning as incorrect (it seems past cleanings encountered upwards of 10,000 bad emails!) and going through them one at a time is no good. Does anyone have any suggestions for how to go about this?

Here's the query I used to return the bad emails:

 SELECT id,email
 FROM table
 WHERE NOT
 (
 CHARINDEX(' ',LTRIM(RTRIM([email]))) = 0 
 AND   LEFT(LTRIM([email]),1) <> '@' 
 AND   RIGHT(RTRIM([email]),1) <> '.' 
 AND   CHARINDEX('.',[Email],CHARINDEX('@',[email])) - CHARINDEX('@',[email]) > 1 
 AND   LEN(LTRIM(RTRIM([email]))) - LEN(REPLACE(LTRIM(RTRIM([email])),'@','')) = 1 
 AND   CHARINDEX('.',REVERSE(LTRIM(RTRIM([email])))) >= 3 
 AND   (CHARINDEX('.@',[email]) = 0 AND CHARINDEX('..',[email]) = 0)
 )
 or id in (select id from table where email like '%[+;(,!]%')       
Phil
  • 405
  • 3
  • 14

1 Answers1

0

This question has already been asked (and somewhat answered) on Stack Overflow - T-SQL: checking for email format

I suggest using https://www.rfc-editor.org/rfc/rfc3696 to come up with a solution.

Also, don't assume the second level domain is always a max of four characters - In New Zealand, our longest second level domain is ".parliament.nz". I use a ".geek.nz" second level domain and are constantly having websites tell me it's an invalid email address.

Wikipedias article, http://en.wikipedia.org/wiki/Email_address#Valid_email_addresses is also informative - The following are all valid email addresses

  • niceandsimple@example.com
  • a.little.unusual@example.com
  • much."more\ unusual"@example.com
  • very.unusual."@".unusual.com@example.com
  • very."(),:;<>[]".VERY."very\\ @"very".unusual@cool.example.com
Community
  • 1
  • 1
John
  • 5,672
  • 7
  • 34
  • 52
  • Thanks for the suggestions. I saw the question you referenced, that's where I got my query from. That just returns the emails though, which wasn't my question since I already know how to do that. I will take a look at the paper you suggested and make sure the second level domain check is valid. – Phil Oct 12 '11 at 20:07