0

The regex below:

EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-.]+\.[a-z]+\z/i

is what I initially used to validate email format. After finding that the format "name@email...com" was passing my tests, I copy/pasted a different piece of regex that limits the amount of periods. This looks like:

EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-]+(?:\.[a-z\d\-]+)*\.[a-z]+\z/i

The main difference is the piece of regex below:

(?:\.[a-z\d\-]+)

I can't quite figure out how this bit works. Can someone break it down for me?

sawa
  • 165,429
  • 45
  • 277
  • 381
Ponchooo
  • 83
  • 1
  • 9
  • 3
    You're doing it wrong and excluding a lot of legitimate domains. The only thing we can presume about email addresses these days is that they contain an `@`. The rest is *extremely* hazy. `x@co` is potentially a valid email address, as is `x@გე`. – tadman Oct 02 '14 at 19:06
  • 3
    Honestly, given how crazy email addresses are, the best way to validate one is to **try and deliver it**. If it succeeds, it's valid. If it fails, it doesn't matter if it's able to pass a regular expression, it's garbage. – tadman Oct 02 '14 at 19:10
  • Don't reinvent the wheel. – sawa Oct 02 '14 at 19:12
  • None of you have answered my question. – Ponchooo Oct 02 '14 at 19:13
  • 2
    I'm only being stubborn here because many people get email validation **horribly** wrong and this infuriates people with legitimate, standards compliant addresses. Unicode domains are a thing, so having `[a-z]` in your TLD matcher is going to be a huge problem. – tadman Oct 02 '14 at 19:16
  • 1
    While you might *think* they didn't answer your question, they actually did very well. You can't *validate* an address with a regular expression like you're using. As @tadman said, "try and deliver it" if you want to know whether it's valid. If someone responds it's at least valid; It might not belong to whoever claimed it was theirs, but at least it was valid. Read "[Using a regular expression to validate an email address](http://stackoverflow.com/questions/201323/using-a-regular-expression-to-validate-an-email-address)" for a much better discussion. – the Tin Man Oct 02 '14 at 20:58
  • See "[Mail::RFC822::Address: regexp-based address validation](http://ex-parrot.com/~pdw/Mail-RFC822-Address.html)" if you want a valid pattern for testing an email address. That still won't prove whether the address is good, it'll only test whether the address met the spec. – the Tin Man Oct 02 '14 at 21:03

3 Answers3

0

The problem with your regular expression here is that you're allowing for multiple dots:

/[a-z\.]+\.[a-z]+\z/

To fix this you need to make your repeating pattern more specific in terms of structure:

/(?:[a-z]+\.)+[a-z]+\z/

That means you can have one or more repeating groups of letters plus dot. That will exclude multiple dots in a row.

Do keep in mind that email addresses are getting increasingly insane with the introduction of new GTLDs that are often used without any sort of prefix. That is, example@google may be a valid address in the future. You can't expect there to be a dot in the domain.

tadman
  • 208,517
  • 23
  • 234
  • 262
0

You have [a-z\d\-]+(?:\.[a-z\d\-]+)*. The [a-z\d\-]+ part ensures that this part of the string starts with a sequence of at least one non-period character. A period is only allowed one per (?:\.[a-z\d\-]+) structure. In each (?:\.[a-z\d\-]+), the period \. is necessarily followed by [a-z\d\-]+, which includes at least one non-period character. This ensures that whenever a period appears, it has at least one non-period character on the left and on the right. In other words, consecutive periods are not allowed.

sawa
  • 165,429
  • 45
  • 277
  • 381
0

Notice that in this subexpression:

(?:\.[a-z\d\-]+)

The character class [a-z\d-] does not contain a period. The expression requires there to be at least one (+) of those characters after the period (\.) in order to match. Therefore, a series of periods with no letters or digits or hyphens between them won't match the repetition of the subexpression.

Mark Reed
  • 91,912
  • 16
  • 138
  • 175