8

I would like a regular expression that will extract email addresses from a String (using Java regular expressions).

That really works.

EugeneP
  • 11,783
  • 32
  • 96
  • 142
  • 2
    E-mail addresses and regex: http://stackoverflow.com/questions/201323/what-is-the-best-regular-expression-for-validating-email-addresses – Bart Kiers Feb 12 '10 at 09:43
  • 1
    Yep. But in fact validating is not always what we need. If you'll put the S and ^ symbols it won't work with an arbitrary text. I hope my question & answer will be useful to others as well. – EugeneP Feb 12 '10 at 09:46
  • The (many!) patterns/answers posted in that thread should provide you with more than enough information IMO. – Bart Kiers Feb 12 '10 at 12:52

5 Answers5

15

Here's the regular expression that really works. I've spent an hour surfing on the web and testing different approaches, and most of them didn't work although Google top-ranked those pages.

I want to share with you a working regular expression:

[_A-Za-z0-9-]+(\\.[_A-Za-z0-9-]+)*@[A-Za-z0-9]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})

Here's the original link: http://www.mkyong.com/regular-expressions/how-to-validate-email-address-with-regular-expression/

EugeneP
  • 11,783
  • 32
  • 96
  • 142
  • 5
    Sorry, this is not right. It will fail for plus-addressing (http://en.wikipedia.org/wiki/E-mail_address#Sub-addressing), among other things (an example is foo+@gmail.com). Writing a correct regular expression for email addresses is /very/ hard (if not impossible). See also http://stackoverflow.com/questions/201323/what-is-the-best-regular-expression-for-validating-email-addresses/201378#201378 – Matthew Flaschen Feb 12 '10 at 13:04
  • And not talking about ICANN's decision to allow non-latin characters in email addresses: http://stackoverflow.com/questions/201323/what-is-the-best-regular-expression-for-validating-email-addresses/1931322#1931322 – BalusC Feb 12 '10 at 19:21
  • 2
    Well, you're right, I didn't know that a plus sign could be a part of any email address. I can be easily added between square brackets. But I'm pretty sure 99.9% of people do not use it, and most email servers do not allow a plus sign as part of the email address. Absolutely agree that there may be situations where no matter what regular expression will fail on email validation/extraction. Though this one worked for me and I've seen others that did not. – EugeneP Feb 15 '10 at 07:28
  • Well google allows + sign, thus all smart users having a gmail can do that. ;P – Rihards Jun 10 '11 at 22:28
5

I had to add some dashes to allow for them. So a final result in Javanese:

final String MAIL_REGEX = "([_A-Za-z0-9-]+)(\\.[_A-Za-z0-9-]+)*@[A-Za-z0-9-]+(\\.[A-Za-z0-9-]+)*(\\.[A-Za-z]{2,})";
Marcus
  • 12,296
  • 5
  • 48
  • 66
thealy
  • 51
  • 1
  • 1
3

Install this regex tester plugin into eclipse, and you'd have whale of a time testing regex
http://brosinski.com/regex/.

Points to note:
In the plugin, use only one backslash for character escape. But when you transcribe the regex into a Java/C# string you would have to double them as you would be performing two escapes, first escaping the backslash from Java/C# string mechanism, and then second for the actual regex character escape mechanism.

Surround the sections of the regex whose text you wish to capture with round brackets/ellipses. Then, you could use the group functions in Java or C# regex to find out the values of those sections.

([_A-Za-z0-9-]+)(\.[_A-Za-z0-9-]+)@([A-Za-z0-9]+)(\.[A-Za-z0-9]+)

For example, using the above regex, the following string

abc.efg@asdf.cde

yields

start=0, end=16
Group(0) = abc.efg@asdf.cde
Group(1) = abc
Group(2) = .efg
Group(3) = asdf
Group(4) = .cde

Group 0 is always the capture of whole string matched.

If you do not enclose any section with ellipses, you would only be able to detect a match but not be able to capture the text.

It might be less confusing to create a few regex than one long catch-all regex, since you could programmatically test one by one, and then decide which regexes should be consolidated. Especially when you find a new email pattern that you had never considered before.

Blessed Geek
  • 21,058
  • 23
  • 106
  • 176
  • @h2g2java Talking about myself, I already use a similar plugin. And I appreciate your answer very much, cuz I also find that without such tools working with regular expressions can be a nightmare. I'm sure your answer will help many people to save their time. – EugeneP Feb 15 '10 at 07:23
1

a little late but ok.

Here is what i use. Just paste it in the console of FireBug and run it. Look on the webpage for a 'Textarea' (Most likely on the bottom of the page) That will contain a , seperated list of all email address found in A tags.

    var jquery = document.createElement('script');
    jquery.setAttribute('src', 'http://code.jquery.com/jquery-1.10.1.min.js');
    document.body.appendChild(jquery);

    var list = document.createElement('textarea');
    list.setAttribute('emaillist');
    document.body.appendChild(list);
var lijst = "";

    $("#emaillist").val("");
    $("a").each(function(idx,el){
        var mail = $(el).filter('[href*="@"]').attr("href");
        if(mail){
            lijst += mail.replace("mailto:", "")+",";
        }
    });
    $("#emaillist").val(lijst);
Digital Human
  • 1,599
  • 1
  • 16
  • 26
0

The Java 's build-in email address pattern (Patterns.EMAIL_ADDRESS) works perfectly:

    public static List<String> getEmails(@NonNull String input) {
        List<String> emails = new ArrayList<>();
        Matcher matcher = Patterns.EMAIL_ADDRESS.matcher(input);
        while (matcher.find()) {
            int matchStart = matcher.start(0);
            int matchEnd = matcher.end(0);
            emails.add(input.substring(matchStart, matchEnd));
        }
        return emails;
    }
Duy Pham
  • 1,179
  • 1
  • 14
  • 19