2

We're getting fraudulent emails in our database, and are trying to make an alert to find them. Some example of email addresses we are getting:

addisonsdsdsdcfsd@XXXX.com
agustinasdsdfdf@XXXX.com

I want the query to search for:
pattern of consonants and pattern length > 4 characters

Here's what I have so far, I can't figure out how to get it to search for the length of the string. Right now it's catching addresses that have even two consonants back to back, which I want to avoid because that catches emails like bobsaget@xxxx.com.

select * from recips
where address like like '%[^aeiou]@%'

UPDATE

I think there is some misunderstanding of what I am trying to find, this is not a query for validating emails, we are simply trying to spot patterns in our signups for fraudulent emails.

we are searching on other criteria besides this, such as datelastopened/clicked, but for the sake of keeping the question simple I only attached the string that searches for the pattern. We don't mail to anyone who has hardbounced more than once. However, these emails in particular are bots that still find a way to click/open and don't hardbounce. They are also coming from particular sets of IP blocks where the first octets are the same, and these IP blocks vary.

this is by no means our first line of defense, this is just a catch-all to make sure we catch anything that slips through the cracks

  • 4
    I don't think you can invalidate e-mail addresses by something so arbitrary - what if my e-mail address is RobertGriffinXXV@msn.com? You should validate e-mail addresses by e-mailing them a link to click. If they don't click it (or it bounces!) then it's not valid. – Aaron Bertrand Sep 25 '13 at 14:44
  • `lynyrdskynyrd@XXX.com` would be a good example of why matching consonants is not a good idea – Jamie Taylor Sep 25 '13 at 14:44
  • 1
    How are these invalid email address getting into your database? The places to catch invalid Emails is way before you dump them into a database. SQL Server is not nearly the best kind of spam filtering you can get. Not very cost-efficient either. – RBarryYoung Sep 25 '13 at 14:51
  • 2
    Why don't you just validate emails by sending them an email and create a process around activating it once they've clicked a unique expiring link sent to that email address. Then clean up the table based on accounts that weren't activated within the expiry limit. – Tanner Sep 25 '13 at 14:53
  • This link may help you... [http://stackoverflow.com/questions/13698956/email-validation-in-sql-server-2008][1] [1]: http://stackoverflow.com/questions/13698956/email-validation-in-sql-server-2008 – kaushik0033 Sep 25 '13 at 14:56
  • Please go through below link:- http://stackoverflow.com/questions/13698956/email-validation-in-sql-server-2008 – kaushik0033 Sep 25 '13 at 14:57
  • hey guys, updated post with additional explanation – mikenonymous Sep 25 '13 at 15:06

1 Answers1

0

I'd think your current query is finding bobsaget@xxxxx.com because it contains t@ which matches [^aeiouy]@ because that character-class between [] only matches 1 character unless you quantify it like so: [^aeiouy]{4,}@

Maybe that works, but what I'm getting from Googling about the using Regex in the WHERE clause in SQL-Server, you need to define a User Defined Function to do this for you. If that is too cumbersome, maybe doing something like this would do the trick:

WHERE address LIKE '%[^aeiouy][^aeiouy][^aeiouy][^aeiouy]@%'

Side note, just 4 seems strict to me, I know languages where Heinsch would be a valid name. So I'd go for 6 or more I think, in which case it would be [^aeiouy]{6,}@ or repeating the [^aeiouy] part 6 times in the above query.

asontu
  • 4,548
  • 1
  • 21
  • 29