1

I'm trying to do a basic project for class, but I've run into an issue. When validating the code, it works when "test@test.com" is entered. However, when further testing and say inputting "test@test.com.com" is entered, it still returns valid. Here's the code:

  System.out.println("Enter a valid email address: ");
  Scanner scan = new Scanner(System.in);
  emailAddress = scan.next();
  String email_regex = "^[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*@"
    + "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})$";
  testString = emailAddress;
  b = testString.matches(email_regex);

  if (b) 
  {
    System.out.println("The email address \"" + emailAddress + "\" is valid.");
  }
  else 
  {
    System.out.println("The email address \"" + emailAddress + "\" IS NOT valid.");
  }
} while (!b);

It functions like it's supposed to, minus knowing to mark "test@test.com.com" or any other similar email format as invalid. Suggestions would be appreciated!

EDIT: I'll add the instructions and the feedback I received.

The instructions: An email address contains the @ character. Write a program that takes asks for an email address input from the user and determines whether it is a valid address or not. This is based on the presence of the @ character and no spaces in the string. You do not need to worry about any other characters in the input word. Output the result "The word IS an email address" or "The word IS NOT an email address"

Example - Input: testuser@mydomain.com Output: The word IS an email address

Input: my123user Output: The word IS NOT an email address

My feedback:

Program incorrectly identifies test@test.com.com as a valid email. Code is not formatted properly and indentation is incorrect. Please ensure your code is indented properly with each new set of curly brackets (which indicates scope).

ccalex
  • 25
  • 7
  • But that *could* be a valid domain. OK, `com.com` might not be itself but you can have subdomains, so `email@subdomain.maindomain.tld` is a valid construct. – VLAZ Apr 27 '20 at 14:10
  • Does this answer your question? [Java regex email](https://stackoverflow.com/questions/8204680/java-regex-email) – Arvind Kumar Avinash Apr 27 '20 at 14:11
  • @VLAZ Based on the assignment, the professor doesn't count it as valid. I got my grade back today and he commented "Program incorrectly identifies test@test.com.com as a valid email," so I'm not sure how to properly fix the issue so it returns properly. – ccalex Apr 27 '20 at 14:13
  • @ArvindKumarAvinash I'm not sure the dupe applies. The question is why *this code here* behaves not as expected. Linking to a different question with different code and different answers doesn't address that. – VLAZ Apr 27 '20 at 14:14
  • @ccalex so what *is* considered valid? Should the domain part only include a single dot? Or should it *specifically* disallow `.com.com`? Perhaps slightly more generally repetition - no `.net.net` either, but `subdomain.domain.com` is fine? – VLAZ Apr 27 '20 at 14:15
  • I wouldn't call email validation a basic project for anyone, see https://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx/ and here is a link to a regex I came up with using Perl https://regex101.com/library/31plIS – JGNI Apr 27 '20 at 14:16
  • 1
    @JGNI yeah. I don't think it's a really good assignment. Especially since it purposefully misleads people to what's valid or not. – VLAZ Apr 27 '20 at 14:17
  • 1
    I agree with VLAZ. The mentioned address is valid. You should define the purpose of your regex based validation. In my opinion, an email address validation shouldn't be more complicated than `\S+@\S+`. This will only give feedback that the user confused input fields. If the user doesn't want to give his email address, he'll give test@example.com (or worse: a random address) in the end. That's why people usually have to proof they can receive an email to the given address. – steffen Apr 27 '20 at 14:18
  • @ccalex your professor is wrong, show them the rfc :-) – JGNI Apr 27 '20 at 14:19
  • You can even receive mails to ccalex@localhost, ccalex@10.0.0.1, ccalex@net.net.net.net. – steffen Apr 27 '20 at 14:21
  • @VLAZ I found the assignment confusing myself, honestly, though it looks like it should be simple. But I appreciate the feedback, of course! :D – ccalex Apr 27 '20 at 14:23
  • @JGNI I'm starting to think so, too, lol. – ccalex Apr 27 '20 at 14:23
  • Oh, and concerning your code... :) 1. There's no need for `^` and `$` when you `match()`, 2. in a character class (`[...]`) a minus should be the very first or last character, 3. there are many more valid characters in email addresses, 4. you should make the pattern case insensitive. – steffen Apr 27 '20 at 14:24
  • All the following are valid email addresses Abc\@def@example.com, Fred\ Bloggs@example.com, Joe.\\Blow@example.com, "Abc@def"@example.com, "Fred Bloggs"@example.com, customer/department=shipping@example.com, $A12345@example.com, !def!xyz%abc@example.com, _somename@example.com – JGNI Apr 27 '20 at 14:24
  • @steffen I get what you're saying. I feel like "test@test.com.com" could be a valid email address given certain circumstances, but the answer is still wrong to my professor. I'm not content with an 80 on an assignment because of it. Lol~ :p – ccalex Apr 27 '20 at 14:25
  • OK, that assignment does **NOT** say that `.com.com` is invalid. It only says that you need an `@` and no spaces. Which is...mostly correct. I guess correct enough - it's really better than a lot of email validators. Yet that is rejected. – VLAZ Apr 27 '20 at 14:26
  • @VLAZ I agree. I feel like the instructions could have been clearer, it would have eliminated a lot of stressors, for sure. I've got the meat of the assignment, now I just need to figure out this minor issue. Hence the feedback stating "Program incorrectly identifies test@test.com.com as a valid email." D: – ccalex Apr 27 '20 at 14:30
  • @JGNI Can you tell this to my professor? I'd appreciate it. Lol – ccalex Apr 27 '20 at 14:31
  • You cannot, with a single regexp, sort out those cases while still allowing more than one `.` in the domain part, without hardcoding it. This means: if you really need to do this, you need to explicitely disallow every single combination of domain and TLD you wish to disapprove and pack that in your regexp - or implement a seperate check. – Johannes H. Apr 27 '20 at 14:32
  • Thanks for the feedback @steffen ! – ccalex Apr 27 '20 at 14:32
  • 1
    @ccalex ask your professor, if john@example.co.uk is a valid email address – JGNI Apr 27 '20 at 14:34
  • @JGNI I'd assume he would say no, mostly because he would say "we're in America and that's not what the assignment is asking for." – ccalex Apr 27 '20 at 14:42
  • @JGNI We tend to forget we're not the only country in the world, it's true. But that's a different discussion for a different place. I appreciate all the feedback, though! :) – ccalex Apr 27 '20 at 14:53

1 Answers1

-1

try below regex it works for me

String email_regex = "^(([\\w-]+\\.)+[\\w-]+|([a-zA-Z]{1}|[\\w-]{2,}))@"
                + "[A-Za-z0-9-]+(\\.com|abc\\.co|abc\\.nz|abc\\.org|abc\\.net)$";

QuickSilver
  • 3,915
  • 2
  • 13
  • 29
  • This will also allow `email@domain.com.com` – VLAZ Apr 27 '20 at 14:30
  • That doesn't match the rfc's, but it does match what the professor is incorrectly calling an invalid email address – JGNI Apr 27 '20 at 14:31
  • I don't see how this would not match "test@test.com.com". `test` matches `[a-zA-Z0-9_!#$%&'*+/=?``{|}~^.-]+`. @ matches @. `test.com.com` matches `[a-zA-Z0-9.-]+` just fine. So: same issue than the regexp provided in the question. – Johannes H. Apr 27 '20 at 14:35
  • I tried the regex, but it still doesn't eliminate the email@domain.com.com It still returns as "valid" when it's meant to be "invalid." – ccalex Apr 27 '20 at 14:35
  • @ccalex Corrected the regex – QuickSilver Apr 27 '20 at 14:36
  • 1
    Now it won't match `test@abc.net` either – Johannes H. Apr 27 '20 at 14:38
  • @Quicksilver you're the best, thank you !! I just tested the code and it works. Thank you so much xx – ccalex Apr 27 '20 at 14:38
  • @ccalex If it works please consider accepting the answer – QuickSilver Apr 27 '20 at 14:39
  • @JohannesH. I'm just glad it works for at least email@domain.com. To be fair, the assignment instructions weren't specific. I'll try to resubmit the assignment and see what he says. – ccalex Apr 27 '20 at 14:41
  • Please, have a look at these sites: [TLD list](https://www.iana.org/domains/root/db); [valid/invalid addresses](https://en.wikipedia.org/wiki/Email_address#Examples); [regex for RFC822 email address](http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html) – Toto Apr 27 '20 at 15:30
  • @Toto Thank you so much, I'll be checking these links out ! – ccalex Apr 27 '20 at 19:56