-2

I tried emailregex.com and used their Java regular expression, but it failed for international characters. I also tested the RFC standard regular expression on regexr.com. I need a regular expression that will validate all of the following:

  • cow牛@yahoo.com
  • email@example.com
  • user@butterfly-effect.com

I know a regular expression is impossible to capture 100% of the emails, but I need to include international characters. Oh, and international characters are allowed in email addresses.

I originally had this one (https://stackoverflow.com/a/26989421/148844) but it failed to validate hyphenated domains.

I tried:

InternetAddress ia = new InternetAddress("cow牛@yahoo.com");

But it threw

javax.mail.internet.AddressException: Illegal character in address (cow牛@yahoo.com,3)

I tried searching the internet, but found no relevant results.

https://duckduckgo.com/?q=RFC+6530+regex

Morteza Jalambadani
  • 2,190
  • 6
  • 21
  • 35
Chloe
  • 25,162
  • 40
  • 190
  • 357
  • 1
    Just send an email to the address. Works 99% of the time. – Benjamin Urquhart Jun 15 '19 at 16:10
  • What is your source for what a valid email address is? – President James K. Polk Jun 15 '19 at 16:11
  • @BenjaminUrquhart It is for an API. – Chloe Jun 15 '19 at 16:13
  • 1
    What's the point in having a stricter regex than `.+@.+\\..+`? If someone wants to write a nonsense email, then they don't care if they need to write `/()/()@/&(/&(.&/(` or `YouDontGetMyMail@Buzz.off`. – Tom Jun 15 '19 at 16:18
  • [Questions asking us to recommend or find a book, tool, software library, tutorial or **other off-site resource** are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.](https://stackoverflow.com/help/on-topic) – Turing85 Jun 15 '19 at 16:26
  • 1
    @Turing85 I'm not asking for a book, tool, library, tutorial, or off-site resource. I did describe the problem and what I have tried. – Chloe Jun 15 '19 at 16:28
  • 3
    @Chloe well... your title reads "*Where can I **find**..." =) – Turing85 Jun 15 '19 at 16:29
  • 3
    @Chloe Your title is misleading then. I also thought you were looking for resources. – Modus Tollens Jun 15 '19 at 16:29

2 Answers2

1

If you turn on Unicode for this regex, it will match
International word characters. Basically, alphanum's, but not punctuation.

This is the RFC5322 regex where [a-zA-Z0-9] is replaced with [^\W_]
which introduces the \w construct that when used in Unicode expands the
allowable alphanums.

Raw:

(?im)^(?=.{1,64}@)(?:("[^"\\]*(?:\\.[^"\\]*)*"@)|((?:[^\W_](?:\.(?!\.)|[-!#\$%&'\*\+/=\?\^`\{\}\|~\w])*)?[^\W_]@))(?=.{1,255}$)(?:(\[(?:\d{1,3}\.){3}\d{1,3}\])|((?:(?=.{1,63}\.)[^\W_][-\w]*[^\W_]*\.)+[^\W_](?:[^\W_]|-){0,22}[^\W_])|((?=.{1,63}$)[^\W_][-\w]*))$   

(Don't forget the Unicode flag)

https://regex101.com/r/98Z0Ls/1

Stringed:

"(?im)^(?=.{1,64}@)(?:(\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"@)|((?:[^\\W_](?:\\.(?!\\.)|[-!#\\$%&'\\*\\+/=\\?\\^`\\{\\}\\|~\\w])*)?[^\\W_]@))(?=.{1,255}$)(?:(\\[(?:\\d{1,3}\\.){3}\\d{1,3}\\])|((?:(?=.{1,63}\\.)[^\\W_][-\\w]*[^\\W_]*\\.)+[^\\W_](?:[^\\W_]|-){0,22}[^\\W_])|((?=.{1,63}$)[^\\W_][-\\w]*))$"  


However, I suggest replacing [^\W_] with [\pL\pN] to exclude the
lowline type characters, of which there are about 2000 of them.

Raw:

(?im)^(?=.{1,64}@)(?:("[^"\\]*(?:\\.[^"\\]*)*"@)|((?:[\pL\pN](?:\.(?!\.)|[-!#\$%&'\*\+/=\?\^`\{\}\|~\w])*)?[\pL\pN]@))(?=.{1,255}$)(?:(\[(?:\d{1,3}\.){3}\d{1,3}\])|((?:(?=.{1,63}\.)[\pL\pN][-\w]*[\pL\pN]*\.)+[\pL\pN](?:[\pL\pN]|-){0,22}[\pL\pN])|((?=.{1,63}$)[\pL\pN][-\w]*))$  

https://regex101.com/r/HTqoaT/1

Stringed:

"(?im)^(?=.{1,64}@)(?:(\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"@)|((?:[\\pL\\pN](?:\\.(?!\\.)|[-!#\\$%&'\\*\\+/=\\?\\^`\\{\\}\\|~\\w])*)?[\\pL\\pN]@))(?=.{1,255}$)(?:(\\[(?:\\d{1,3}\\.){3}\\d{1,3}\\])|((?:(?=.{1,63}\\.)[\\pL\\pN][-\\w]*[\\pL\\pN]*\\.)+[\\pL\\pN](?:[\\pL\\pN]|-){0,22}[\\pL\\pN])|((?=.{1,63}$)[\\pL\\pN][-\\w]*))$"
  • Your regex doesn't allow special characters at the beginning and end of the local part (e.g. `#valid~@example.org`), but I believe these are allowed. – Luciano Jun 24 '22 at 21:42
0

I took @Tom's suggestion and made it very simple. I modified it slightly to prevent two @ signs.

 "[^@]+@.+\\..+"


public class Tmp {
    public static void main(String[] argv) throws AddressException {
        String REGEX1 = "(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])";
        String REGEX2 = "[^@]+@.+\\..+";
        String foreignEmail = "cow牛@yahoo.com";
        String hyphenEmail = "games@butterfly-effected.com";
        boolean valid = foreignEmail.matches(REGEX1);
        System.out.println(valid);
        valid = hyphenEmail.matches(REGEX1);
        System.out.println(valid);
//      InternetAddress ia = new InternetAddress("cow牛@yahoo.com");
        System.out.println(foreignEmail.matches(REGEX2));
        System.out.println(hyphenEmail.matches(REGEX2));

    }


}
Chloe
  • 25,162
  • 40
  • 190
  • 357
  • I guess you could also change the last part (for the top-level domain) to `[a-z]+` instead of `.+`, but it looks fine to me. – Tom Jun 15 '19 at 17:12