0

I want to extract all valid email addresses from a given text.

Emails are considered to be in format user @ host, where:

user is a sequence of letters and digits, where '.', '-' and '_' can appear between them.

host is a sequence of at least two words, separated by dots '.'. Each word is sequence of letters and can have hyphens '-' between the letters, but ending with a letter.

• Examples of valid emails:

  • s.kiki@hotmail.co.uk
  • no-reply@github-bg.com.uk.bg
  • no_reply@github-bg.com.uk.bg

I wrote this regex:

/(?<!\S)[a-z0-9]+[\-\._]*[a-z0-9]+@[a-z]+\-*[a-z]+(\.[a-z0-9\-]+){1,}(?=\s|$)/g

But match and this case: suport@github.com-

How to get rid of the last '-' ?

  • 1
    Regex for emails are complicated. There are many rules about allowed special characters, string lengths. While you can do a basic email test it is better to use an existing library that has been extensively tested. Exame of rules: https://en.m.wikipedia.org/wiki/Email_address. Use something like yup. – Steve Tomlin Aug 02 '21 at 14:46
  • Or if you just want basic example which excludes all special characters: /^[a-z]\w*(\.[a-z]\w*)*\@[a-z]\w*(\.[a-z]\w*)*$/ – Steve Tomlin Aug 02 '21 at 14:56
  • What are your arbitrary rules for a "valid" email? If you want to be compliant with the RFC then see https://stackoverflow.com/q/201323/2191572 – MonkeyZeus Aug 02 '21 at 16:08
  • I have a task with this condition: Emails are considered to be in format @, where: • is a sequence of letters and digits, where '.', '-' and '_' can appear between them. • is a sequence of at least two words, separated by dots '.'. Each word is sequence of letters and can have hyphens '-' between the letters, but ending with a letter. • Examples of valid emails: s.kiki@hotmail.co.uk , no-reply@github-bg.com.uk.bg , no_reply@github-bg.com.uk.bg – Маргарита Георгиева Aug 03 '21 at 10:51

1 Answers1

0

To exclude the - at the end of your host words you can create two groups of character lists: One that includes the minus and one excluding it.

You require the capture group that excludes the - to be matched at least once and the other one any number of times.

That can look like this:

([a-zA-Z-]*[a-zA-Z]+\.){1,}[a-zA-Z-]*[a-zA-Z]+$

See https://regex101.com/r/LwHlU3/1 for what this does and does not match.

FlyingFoX
  • 3,379
  • 3
  • 32
  • 49