0

I am trying to find email addresses from the HTML file, I need email addresses with top-level domain(tld) to level 1 only, for example from the email addresses given below, bold addresses are invalid in this case

  • test@domain12.com
  • test@domain12.com
  • test123@domain-12.com
  • test@domain.co.au
  • test.abc@domain.ac.nz
  • test@abc.co
  • example@testdomain.net
  • sample@organization.org

I am using the following regex it works fine if there are only email addresses, but if I add any text after the email addresses it doesn't match the criteria.

(?=<\s|^)\b[a-zA-Z0-9.-]+@[a-zA-Z0-9-]+.[a-zA-Z]{2,6}$(?=\s|$|.+)

success case:

  • test@domain12.com
  • example@testdomain.net
  • sample@organization.org

Failure case:

  • test@domain12.com random text after email address
  • example@testdomain.net random text after email address
  • sample@organization.org random text after email address

Any help in this scenario will be really appreciated.

  • Does this answer your question? [How to validate an email address using a regular expression?](https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression) – Michał Turczyn Nov 25 '20 at 14:28
  • Remove dollar sign from the end od regex. And dont forget, that tld can be more than 6 chars length. – pavel Nov 25 '20 at 15:14
  • This solution consider tlds up to two level, where I need tld to one level only – Saqib Shafique Nov 25 '20 at 15:29

3 Answers3

0

I've made the regex - my custom validator to extract the email addresses like that.

Try this:

^(?<check_Duplicate_Special_Symbol>(?![\w-.]*[\.@][\.@][\w-]*))(?<user>(?!\.)[\w.-]+)(?<domain>@(?:[A-Za-z][\w-.]*))(?<subDomain>(?:\.[A-Za-z][\w-.]*)+)$

For more info see regex-demo

But, it is not a good choice. You could consider how-to-validate-an-email-address-using-a-regular-expression to get a correctly validator.

Leonardo Scotti
  • 1,069
  • 8
  • 21
Heo
  • 266
  • 2
  • 10
0

Try This for a single match:

(?:\s)(.[^@]*@[^.]*\.[^.0-9A-Z]*)(?:\s)

or This for a top level match and a match per-section:

(?:\s)((.[^@])(?:*@)([^.]*)(?:\.)([^.0-9A-Z]*))(?:\s)
Leonardo Scotti
  • 1,069
  • 8
  • 21
0

I made this regex:

(?<=\s|^)([a-z0-9-.])+@+([a-z0-9-]*)\.([a-z]*)\s

It extracts email from string with one level tld. You can tokenize the text on spaces/line breaks and iteratively match with regex.

follow this link