1

I have a string including e-mail. There are probably extra characters before and / or after it. input examples:

a1@b.com
a2@b.com abcd efg
x y z a3@b.com
p q a4@b.com x z
asd[x5@c.net]gh

I want to remove the extra characters.

Desired outputs:

a1@b.com
a2@b.com
a3@b.com
a4@b.com
x5@c.net

Valid characters are a-zA-Z0-9._ So there are probably invalid characters before and / or after e-mail.

I tried this code to identify whether it is a correct email or not (this assumes that it is separated from extra characters by space), but I can not replace to the desired string (using s.replaceAll()):

if (s.matches("(?i).*\\s[a-zA-Z_\\.]+@[a-zA-Z_\\.]+\\.[a-zA-Z_\\.]+.*") ||
    fields[2].matches("(?i).*[a-zA-Z_\\.]+@[a-zA-Z_\\.]+\\.[a-zA-Z_\\.]+\\s.*"))
Alisa
  • 2,892
  • 3
  • 31
  • 44
  • 3
    Read [this](http://stackoverflow.com/questions/201323/using-a-regular-expression-to-validate-an-email-address) – Reimeus Aug 28 '14 at 19:03
  • 2
    Don't use `matches` or `replaceAll`. Instead, set up a [`Matcher`](http://docs.oracle.com/javase/8/docs/api/java/util/regex/Matcher.html), use `find` (which will find a pattern anywhere in the string), and use `group(0)` to return the string it finds. It will be much easier to tell it to return the matched string than to tell it to remove the unmatched characters. – ajb Aug 28 '14 at 19:04
  • 1
    Valid characters are `a-zA-Z0-9._` ... How does your regular expression match the `12345` in the email addresses in your desired output?? – hwnd Aug 28 '14 at 19:08
  • Valid characters are a-zA-Z0-9._ Edited. – Alisa Aug 28 '14 at 19:11
  • 2
    You're probably best off using this regex: \S@\S Your regex will miss a ton a valid email addresses. There are a lot of valid characters allowed in an email address. – tobii Aug 28 '14 at 19:21
  • My email address is `T41$+$UCK$@like.wtf`. That is a perfectly valid email address. Your regex will not allow it. – Qix - MONICA WAS MISTREATED Aug 28 '14 at 19:28
  • @tobii, Qix: My question was not about the validating the characters. Since I have defined a set of valid characters, I wanted to extract the email mentioned in the approach of Nemo. – Alisa Aug 28 '14 at 20:59

3 Answers3

1

you can use java.util.regex.Pattern and java.util.regex.Matcher

This code will do what you ask for:

public static void main(String[] args) {
    String[] testList = {"a1@b.com", 
            "a2@b.com abcd efg", 
            "x y z a3@b.com", 
            "p q a4@b.com x z", 
            "asd[a5@b.coom]gh"};

    Pattern EMAIL_PATTERN = Pattern.compile("[_A-Za-z0-9-]+(\\.[_A-Za-z0-9-]+)*@[A-Za-z0-9-]+(\\.[A-Za-z0-9-]+)*(\\.[A-Za-z]{2,})");


    for(String test : testList){
        Matcher m = EMAIL_PATTERN.matcher(test);
        while (m.find()) {
             System.out.println(m.group(0));
        }
    }
}
Nemo
  • 587
  • 6
  • 12
0

Given your definition of valid characters, try:

^.*?([\w.]+@[\w.]+).*$

and replace with capturing group 1

Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60
0

A validation of email addresses is not possible. It is only possible to validate an email-adress-like-appearence - and even this task is quite tricky, due to new tlds with more than 3 characters.

So, you better find "invalid" email-adresses (mail sending will fail), then missing a valid one.

Use

([a-zA-Z0-9!#$%&'*+-/=?^_`{|}~.]+\@(?:[a-zA-Z0-9.-]+|\[[0-9.]+\]))

to grab anything that could be an email address.

  ([a-zA-Z0-9!#$%&'*+-/=?^_`{|}~.]+\@(?:[a-zA-Z0-9.-]+|\[[0-9.]+\]))

Regular expression visualization

Debuggex Demo

dognose
  • 20,360
  • 9
  • 61
  • 107