2

The regular expression, "String regex = "[0-9a-z]+@[0-9a-z]+.+[0-9a-z]";" , is used for testing a email validation. Basically, im trying to make it so that an email will only match this patter if the email begins with a string of alphanumeric chars, followed by 1 @ symbol, followed by another string of alphanumeric chars, followed by 1 ., and then finally a string of alphanumeric chars. Whats failing is that when i enter an email without a string of alphanumerics after the last ., the program will still match with the regex string. How do i make it so that there MUST be another string of alphanumerics after the .? The whole code is:

import java.util.Scanner;
import java.util.regex.*;

public class Regex
{   
    public static void main (String[]args){

        Scanner input = new Scanner(System.in);
        System.out.println("Please enter your Email");
        String mail = input.nextLine();

        String regex = "[0-9a-z]+@[0-9a-z]+.+[0-9a-z]";

        Pattern p = Pattern.compile(regex);
        Matcher m = p.matcher(mail);

        if(m.find()) {
            System.out.println("VALID");
        } else {
            System.out.println("INVALD");
        }
    }
}
Hovercraft Full Of Eels
  • 283,665
  • 25
  • 256
  • 373
Ziyue Wang
  • 69
  • 6
  • .+ will give you "at least one arbitrary character". I think you meant to have the "+" after the last brackets. – G. Bach Feb 03 '13 at 17:32
  • 1
    For this kind of problems, http://www.regexper.com/ is my tool of choice. – Petr Janeček Feb 03 '13 at 17:32
  • 1
    Related http://stackoverflow.com/questions/201323/using-a-regular-expression-to-validate-an-email-address – andersoj Feb 03 '13 at 17:43
  • Your regular expression for email addresses is very restrictive. For example british email addresses @example.co.uk would not validate. nor would email addresses with . or + in the user part. Check out http://blog.gerv.net/2011/05/html5_email_address_regexp/ for a better one (still not the full spec, but closer. – Charlie Feb 03 '13 at 17:44
  • You shouldn't use regular expressions to validate email addresses. – David Conrad Feb 03 '13 at 18:13

2 Answers2

4

An unescaped . in the expression stands for any character. You need to use either \\., or [.] to match a literal dot.

String regex = "[0-9a-z]+@[0-9a-z]+[.]+[0-9a-z]";

The + after the dot means "one or more occurrences of the prior expression". Above, the "prior expression" is a single dot. To match multiple segments in the e-mail's domain address, you need to add parentheses:

String regex = "[0-9a-z]+@([0-9a-z]+[.])+[0-9a-z]+";
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • 3
    I think the OP forgot a "+" after the last set of brackets; i doubt he wants the address to end in one dot followed by one alphanumeric. – G. Bach Feb 03 '13 at 17:38
  • @G.Bach You're right, there was no plus at the end. This is now fixed. – Sergey Kalinichenko Feb 03 '13 at 17:39
  • He also didn't want multiple segments in the e-mail's domain address, even though they are perfectly valid. Regular expressions should never be used to validate email addresses under any circumstances whatsoever. It's horribly wrong. – David Conrad Feb 03 '13 at 18:14
  • @DavidConrad Why is that? Aren't email addresses a regular language? – G. Bach Feb 03 '13 at 19:02
  • Perhaps just barely, but they have so many rules and exceptions that a proper regex to match them, if it's even possible, would fill an entire sheet of A4 paper and be totally incomprehensible and unmaintainable. But people always start out thinking "It's easy, I know what an email address looks like" and then they write something like the above that excludes tons of perfectly valid email addresses. (Valid per RFC 2822, that is.) – David Conrad Feb 12 '13 at 16:30
1

A slightly better regex would include start of line ^ and end of line $ anchors.

Otherwise it will only need to match a single instance of a valid email in the string and it will pass. Also, instead of the plus sign to indicate 1 or more, you could restrict it to 2 to 4 characters by adding {2,4}. Without these in place something like

myemail@gmail.com@thisIsOdd.helloworld.anythingelse

will erroneously be valid.

String regex = "^[0-9a-z]+@([0-9a-z]+[.])+[0-9a-z]{2,4}$";

DangerDan
  • 519
  • 2
  • 13