0

How can I remove email address from a string? And all other digits and special characters?

Sample String can be

"Hello world my # is 123 mail me @ test@test.com"

Out put string should be

"Hello world my is mail me"

I googled this and found that I can use following regular expressions

"[^A-Za-z0-9\\.\\@_\\-~#]+"

but that example was more to check valid email ids not removing it. I am new to java!

MC Emperor
  • 22,334
  • 15
  • 80
  • 130
user238384
  • 2,396
  • 10
  • 35
  • 36

4 Answers4

5

As pointed out by others, you could use regular expressions to clean up your String and replace unwanted part by an empty string "". To do so, have a look at the replaceAll(String regex, String replacement) method of the String class and at the Pattern class for the syntax of regular expressions in Java.

Below, some code demonstrating one way to clean the provided sample String (maybe not the most elegant though):

String input = "Hello world my # is 123 mail me @ test@test.com";
String EMAIL_PATTERN = "([^.@\\s]+)(\\.[^.@\\s]+)*@([^.@\\s]+\\.)+([^.@\\s]+)";

String output = input.replaceAll(EMAIL_PATTERN, "") // Replace emails 
                                                    // by an empty string
        .replaceAll("\\p{Punct}", "") // Replace all punctuation. One of
                                      // !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
        .replaceAll("\\d", "") // Replace any digit by an empty string
        .replaceAll("\\p{Blank}{2,}+", " "); // Replace any Blank (a  space or 
                                             // a tab) repeated more than once
                                             // by a single space.

System.out.println(output);

Running this code produces the following output:

Hello world my is mail me 

If you need to remove more garbage (or less, like punctuation), well, you've got the principle. Adapt it to suit your needs.

Pascal Thivent
  • 562,542
  • 136
  • 1,062
  • 1,124
2

You can use String#replaceAll() for this. Just let it replace any regex matches by an empty string "". The regex you mentioned is however not very robust. A better one is this (copied from here and slightly changed for use in plain vanilla text):

string = string.replaceAll("([^.@\\s]+)(\\.[^.@\\s]+)*@([^.@\\s]+\\.)+([^.@\\s]+)", "");

Hope this helps.

Community
  • 1
  • 1
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • Note that he doesn't just want to remove email addresses, he wants to remove *all* "special" characters (for some unknown definition of "special", but clearly including numbers, a hash mark, and an ampersand...). – delfuego Dec 30 '09 at 20:44
  • Oh. For that just use `string = string.replaceAll("[^\\p{Alpha}\\s]", "");` afterwards. – BalusC Dec 30 '09 at 21:47
1

Check out the Java regular expression Pattern class and its uses. There's a useful tutorial here which includes replacement methods.

An aside: this is a particularly robust regexp to use for RFC822-compliant email addresses :-) You should be able to come up with something more concise for your needs! There's a discussion of email regexps and trade-offs here.

Brian Agnew
  • 268,207
  • 37
  • 334
  • 440
0

From your example, it looks like it's not just email addresses you're interested in removing, it's all non-alpha characters, so this is trivial:

str = str.replaceAll("([^.@\\s]+)(\\.[^.@\\s]+)*@([^.@\\s]+\\.)+([^.@\\s]+)", "")
         .replaceAll("[^\\p{Alpha} ]", "")
         .replaceAll("[ ]{2,}+", " ");

See the Pattern JavaDocs for information about what the special character class \p{Alpha} means...

delfuego
  • 14,085
  • 4
  • 39
  • 39