1

I have to extract all email addresses from some .txt documents. These emails may have these formats:

  1. a@abc.com
  2. {a, b, c}@abc.edu
  3. some other formats including some @ signs.

I choose ruby for my first language to write this program, but i don't know how to write the regex. Would someone help me? Thank you!

Wayne Werner
  • 49,299
  • 29
  • 200
  • 290
Ikbear
  • 1,267
  • 3
  • 15
  • 17
  • 1
    Related: [Extract email addresses from a block of text](http://stackoverflow.com/questions/504860/extract-email-addresses-from-a-block-of-text) – miku Jul 07 '10 at 11:55
  • I was about to suggest extracting all nonspace char-sequences with `@` in them - but it wouldn't work for your second example. – Amarghosh Jul 07 '10 at 12:31

3 Answers3

6

Depending on the nature of your .txt documents, you don't have to use one of the complicated regexes that attempt to validate email addresses. You're not trying to validate anything. You're just trying to grab what's already there. Generally speaking, a regex to grab what's already there can be much simpler than a regex that needs to validate input.

An important question is whether your .txt documents contain @ signs that are not part of an email address you want to extract.

This regex handles your first two requirements:

\w+@[\w.-]+|\{(?:\w+, *)+\w+\}@[\w.-]+

Or if you want to allow any sequence of non-space characters containing an @ sign, plus your second requirement (which has spaces):

\S+@\S+|\{(?:\w+, *)+\w+\}@[\w.-]+
Jan Goyvaerts
  • 21,379
  • 7
  • 60
  • 72
  • Take care.. It will noch accept "-" inside mail addresses – gies0r Oct 21 '16 at 10:51
  • nor will it accept "." before the @ sign. So fo.o@bar.com => o@bar.com – Eskim0 Jun 29 '17 at 00:24
  • This is an answer to Ikbear's question about his/her specific requirements. It is NOT a general-purpose guide on how to match email addresses with regexes. If you read my article at http://www.regular-expressions.info/email.html then you'll learn that there are always trade-offs. – Jan Goyvaerts Jun 29 '17 at 06:39
2

Have a look at this rather in-depth analysis:

Upshot is use this regular expression:

/^([\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+\.)*[\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+@((((([a-z0-9]{1}[a-z0-9\-]{0,62}[a-z0-9]{1})|[a-z])\.)+[a-z]{2,6})|(\d{1,3}\.){3}\d{1,3}(\:\d{1,5})?)$/i
Jonathan
  • 25,873
  • 13
  • 66
  • 85
1

Found this at https://www.shellhacks.com/regex-find-email-addresses-file-grep/ which met my needs:

\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b
Eskim0
  • 775
  • 7
  • 17
  • This regex does not meet Ikbear's requirement #2. It also matches john@aol...com and fails to match emails on new TLDs such as .solutions that are longer than 6 characters. Great if this regex meets your needs, but not an answer to this question, and not a good general-purpose email regex in 2017. – Jan Goyvaerts Jun 29 '17 at 06:46