21

I have a URI that contains non-ASCII characters like :

http://www.abc.de/qq/qq.ww?MIval=typo3_bsl_int_Smtliste&p_smtbez=Schmalbl�ttrigeSomerzischeruchtanb

How can I remove "�" from this URI

DaveyDaveDave
  • 9,821
  • 11
  • 64
  • 77
M.M
  • 1,343
  • 7
  • 20
  • 49
  • 2
    The set of possible characters is large compared to the set of characters allowed in the [query part of a URI](http://illegalargumentexception.blogspot.co.uk/2009/12/java-safe-character-handling-and-url.html#URI2009_HTML). To delete all non-English text would exclude many languages. Is this what you want? Or do you want to percent-encode the text? Or do you want to [transliterate](http://en.wikipedia.org/wiki/Transliteration) the text to English? – McDowell May 13 '12 at 18:57

5 Answers5

41

I'm guessing that the source of the URL is more at fault. Perhaps you're fixing the wrong problem? Removing "strange" characters from a URI might give it an entirely different meaning.

With that said, you may be able to remove all of the non-ASCII characters with a simple string replacement:

String fixed = original.replaceAll("[^\\x20-\\x7e]", "");

Or you can extend that to all non-four-byte-UTF-8 characters if that doesn't cover the "�" character:

String fixed = original.replaceAll("[^\\u0000-\\uFFFF]", "");
Vijin Paulraj
  • 4,469
  • 5
  • 39
  • 54
Cᴏʀʏ
  • 105,112
  • 20
  • 162
  • 194
21
yourstring=yourstring.replaceAll("[^\\p{ASCII}]", "");
daneshkohan
  • 336
  • 2
  • 5
7

No no no no no, this is not ASCII ... [^\x20-\x7E]

This is real ascii: [^\x00-\x7F]

Otherwise it will trim out newlines and other special characters that are part of ascii table!

Riduidel
  • 22,052
  • 14
  • 85
  • 185
Peter L
  • 101
  • 1
  • 1
6

To remove the Non- ASCII characters from String, below code worked for me.

String str="<UPC>616043287409ÂÂÂÂ</UPC>";

str = str.replaceAll("[^\\p{ASCII}]", "");

Output:

<UPC>616043287409</UPC>
Pritam Banerjee
  • 17,953
  • 10
  • 93
  • 108
Yellesh Chaparthi
  • 413
  • 1
  • 7
  • 8
  • 1
    Please try to avoid just dumping code as an answer and try to explain what it does and why. Your code might not be obvious for people who do not have the relevant coding experience. – Frits Aug 08 '16 at 14:36
4

Use Guava CharMatcher

String onlyAscii = CharMatcher.ascii().retainFrom(original)
Juan Rada
  • 3,513
  • 1
  • 26
  • 26