4

I have a String which contains some special characters and white spaces as well. I want to remove white spaces and special character. I am doing it as:

String str = "45,%$^ Sharma%$&^,is,46&* a$# Java#$43 Developer$#$^ in#$^ CST&^* web*&(,but he%^&^% wants to move@!$@# to another team";
System.out.println(str.replaceAll("[^a-zA-Z]", " ").replaceAll("\\s+", " "));

Output:

sharma is a Java Developer in CST web but he wants to move to another team

Can I do this using single operation? How?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Devendra
  • 1,864
  • 6
  • 29
  • 49
  • possible duplicate of [replace special characters in string in java](http://stackoverflow.com/questions/2608205/replace-special-characters-in-string-in-java) – Joe Sep 12 '14 at 13:40

4 Answers4

20

Replace any sequence of non-letters with a single white space:

str.replaceAll("[^a-zA-Z]+", " ")

You also might want to apply trim() after the replace.

If you want to support languages other than English, use "[^\\p{IsAlphabetic}]+" or "[^\\p{IsLetter}]+". See this question about the differences.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Cephalopod
  • 14,632
  • 7
  • 51
  • 70
  • This does not solve the OP's problem of reducing the spaces to one signle space. "# " should be replaced by " ", not 2 spaces. – Joffrey Apr 08 '14 at 11:54
  • Nevermind, I had been misleaded by the previous post including \\s, forgetting the negation included the spaces ^^ – Joffrey Apr 08 '14 at 11:57
  • 2
    I was about to write the same answer. Working code here: http://ideone.com/QbszDu – ssssteffff Apr 08 '14 at 11:58
  • I am not able to understand the "+" in the [^a-zA-Z]+. please explain the work of plus(+). – Devendra Apr 08 '14 at 12:25
  • In [regexes](http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html), `+` is a quantifier that means that the pattern is repeated one or more times. (`*` means any number of times, including zero; `?` means optional) – Cephalopod Apr 08 '14 at 12:37
  • thanks Arian,I know that but here i am not able to recognize how it is removing spaces. – Devendra Apr 08 '14 at 13:30
  • `[^a-zA-Z]` matches any character that is not a letter, including spaces. The `+` makes that that the hole sequence is matched at once (Look [here](http://regexpal.com/?flags=g&regex=[^a-zA-Z]%2B&input=45%2C%25%24^%20Sharma%25%24%26^%2Cis%2C46%26*%20a%24%23%20java%23%2443%20Developer%24%23%24^%20in%23%24^%20CST%26^*%20web*%26%28%2Cbut%20He%25^%26^%25%20want%20to%20move%40!%24%40%23%20in%20another%20team) for a visualization). The sequence is then replaced with a single space. – Cephalopod Apr 08 '14 at 13:37
  • he did not asked even a single space, why are you adding single space. It should be str.replaceAll("[^a-zA-Z]+", "") – John Oct 04 '16 at 13:42
4

The OR operator (|) should work:

System.out.println(str.replaceAll("([^a-zA-Z]|\\s)+", " "));

Actually, the space doesn't have to be there at all:

System.out.println(str.replaceAll("[^a-zA-Z]+", " "));
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
TomasZ.
  • 449
  • 1
  • 5
  • 10
3

Try this:

str.replaceAll("[\\p{Punct}\\s\\d]+", " ");

Replacing punctuation, digits and white spaces with a single space.

Sabuj Hassan
  • 38,281
  • 14
  • 75
  • 85
0

You can use the below to remove anything that is not a character (A-Z or a-z).

str.replaceAll("[^a-zA-Z]", "");
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Al-Mustafa Azhari
  • 850
  • 1
  • 9
  • 24