2

I'm trying to create an email validation with some exclusions. Basically, ignore email addresses with this format, filtering out all City Governments and Schools: user@ci...us and user@..[a-z]{2}.us

This solution mentions negated character classes [^], alternation |, and the end of string anchor $. How can I rewrite the following, which works without any look ahead?

[a-zA-Z_0-9.-]+<@[a-zA-Z_0-9-]+?\.+[a-zA-Z_0-9.-]+?\.(us|info|to|br|bid|cn|ru)

Using a few of the regex validators this works using a standard look ahead:

(?!.*\@ci\..+?\.us$)(?!.*\@*\..+?\.ca.us$)([a-zA-Z_0-9.-]+@[a-zA-Z_0-9-]+?\.+[a-zA-Z_0-9.-]+?\.(us|info|to|br|bid|cn|ru)

My first attempts led me to this solution which only partially handle the 1st look ahead (I tightened the top level domain with the word boundary \b:

[a-zA-Z_0-9.-]+<@([^c"][^"]+|c[^i"][^+]+|ci[^i"][^i"])[a-zA-Z_0-9-]+?\.+[a-zA-Z_0-9.-]+?\.(\bus\b|\binfo\b|\bto\b|\bbr\b|\bbid\b|\bcn\b|\bru\b|\bu\b)
Community
  • 1
  • 1
RobbieTheK
  • 178
  • 1
  • 11

1 Answers1

0

This is as close as I could get but it gets me there:

[a-zA-Z_0-9.-]+<@([^c]+|c(c|ic)+([^ic]|i[^c][^ic]))+(c(c|ic)+(i|ic))+?\.+[a-zA-Z_0-9.-]+?\.(\bus\b|\binfo\b|\bto\b|\bbr\b|\bbid\b|\bcn\b|\bru\b|\buss\b|\bbw\b|\bu\b)

To clarify I was looking to exclude Locality domains & Affinity Namespaces for public school districts. The above will exclude something@ci.subdomain.us (as well as spam-prone TLD's ending in .info|to|br|bid|cn|ru|uss|bw|u), but as it turns out the email addresses from those usually have 4 levels/parts, e.g., info@ci.boston.ma.us. This blog entry gave me a clue.

RobbieTheK
  • 178
  • 1
  • 11