-1

I want to create a regex in Ruby to restrict my email input to follow the below rules:

[user name]@[domain name].[top-level domain name]

  1. User name may only contain English letters, numbers, plus signs, hyphens, underscores, dots. The plus sign and dot may not appear consecutively.

  2. User name must contain at least one English letter.

  3. Domain name may only contain English letters, numbers, and hyphens.

  4. Top-level domain name may only contain English letters, numbers, and hyphens. Must end with an English letter.

  5. Domain name and top-level domain name must be separated with dots, and the email must contain at least 1 top-level domain name.

here is my regex so far:

/\A[a-zA-Z0-9]((?!\.\.)(?!\+\+)[\w\-+.])*[\w\-]@[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]*)+[a-zA-Z]\z/

I couldn't find a way to make the user name contain at least one English letter. Is there any way to restrict part of the string before the "@" to follow certain rules?

fanfan
  • 39
  • 1
  • 5
  • 2
    Why do you want to restrict emails with these rules? Why don't you just allow all valid email addresses? – spickermann Feb 02 '22 at 18:20
  • @spickermann I actually thought these restrictions are what valid email is like. Are there any official email standards to be followed? – fanfan Feb 05 '22 at 12:21

4 Answers4

2

Yeah, that's probably not exactly what you wanted :)

But such email validation is not a good idea. People may have an email that doesn't contain English letters, may contain characters that you haven't even thought about.

By regexp limiting, you create inconvenience to your users.

Therefore, I believe that the main criterion is the presence of @. If this is valid email, then the user will receive email. If not, then will not. It's quite simple :)

The only way to validate email is to send message and receive confirmation.

Look at what regular expression can be used for email validation:

https://emailregex.com/#crayon-5dcf0d9dc15ec916764848

Or you can use the built-in Ruby regexp, just call the constant

URI::MailTo::EMAIL_REGEXP

But perhaps it's enough just @

mechnicov
  • 12,025
  • 4
  • 33
  • 56
1

In fact, you would like to get something like "and" condition here: the part before @ can include some valid symbols only AND must contain certain symbols at the same time.

With regular expressions, the way to model this is positive lookahead:

s1 = "123@foo.bar"
s2 = "a123@foo.bar"
s3 = "123a@foo.bar"

s1.match?(/(?=[a-zA-Z])\w+@/) # => false
s2.match?(/(?=[a-zA-Z])\w+@/) # => true
s3.match?(/(?=[a-zA-Z])\w+@/) # => true

I simplified the pattern dramatically for simplicity, but the part that is important here is (?=[a-zA-Z]) - we check that there is at least one letter before @ without "consuming" the input stream so that the following pattern could be checked starting from the very same position...

Konstantin Strukov
  • 2,899
  • 1
  • 10
  • 14
1

A valid email address can look quite different from what you described. The local part in front of the @ could include all these characters too: !#$%&'*+-/=?^_``{|}~. Or instead of the domain, there might just be an IP address after the @. And keep in mind that domains do not necessarily need to include a .. And what about 我買@屋企.香港? Yes, it is a valid email address permitted by RFC 6530. You will find other surprising example email addresses on Wikipedia: Email address.

All these rules make a regexp implementing RFC822 quite complex and impractical to use. This answer might be interesting to you in this context too.

Therefore I suggest a way simpler regexp: \A.+@.+\z and then ask the user to validate their email. Or you might simply want to use the regexp that comes with Ruby (URI::MailTo::EMAIL_REGEXP) or with the Devise (Devise.email_regexp) when you are using it.

spickermann
  • 100,941
  • 9
  • 101
  • 131
0

The string meets the requirements if and only if it matches the regular expression

\A(?=[^@]*[a-z])(?![^@]*(?:\+\.|\.\+))[a-z\d+_.-]+@[a-z\d-]+\.[a-z\d-]*[a-z]\z

Rubular demo<¯\(ツ)>PCRE demo

I've included the PCRE demo (at regex101.com) because considerably more information is provided at that link. (For this regex the PCRE engine is compatible with Ruby's.) For example, hover the cursor over each part of the regex at the PCRE link and you will be provided with an explanation of its function.

Note that at both links I've replaced the beginning and end of string anchors (\A and \z) with beginning and end of line anchors (^ and &), and replaced [^@] with [^@\n] in order to demonstrate the regex for a variety of strings, only the first being valid.

The regular expression can be made self-documenting by defining it in free-spacing mode (in which spaces comments are stripped out before the expression is parsed):

\A           # match beginning of string
(?=          # begin a positive lookahead
  [^@]*      # match zero or more chars other than '@'
  [a-z]      # match a letter
)            # end positive lookahead
(?!          # begin negative look-ahead
  [^@]*      # match zero or more chars other than '@'
  (?:        # begin non-capture group
    \+\.     # match '+.'
  |          # or
    \.\+     # match '.+'
  )          # end non-capture group
)            # end negative lookahead
[a-z\d+_.-]+ # match one or more chars from char class
@            # match '@'
[a-z\d-]+    # match one or more chars from char class
\.           # match '.'
[a-z\d-]*    # match zero or more chars from the char class
[a-z]        # match a letter
\z           # match end of string
/ix          # invoke case-indifferent and free-spacing modes
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100