5

How can I replace the string in Java?

E.g.,

String a = "adf�sdf";

How can I replace and avoid special characters?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
zahir
  • 51
  • 1
  • 1
  • 2
  • 3
    Welcome to SO, zahir! Where are you getting your strings from? Random users? A web service? Are you trying to replace something with that string, or use that string to replace something else? – Pops Apr 09 '10 at 14:24
  • It looks like [Mojibake](https://en.wikipedia.org/wiki/Mojibake) - *"...the garbled text that is the result of text being decoded using an unintended character encoding."* – Peter Mortensen Feb 01 '23 at 19:03

4 Answers4

14

You can get rid of all characters outside the printable ASCII range using String#replaceAll() by replacing the pattern [^\\x20-\\x7e] with an empty string:

a = a.replaceAll("[^\\x20-\\x7e]", "");

But this actually doesn't solve your actual problem. It's more a workaround. With the given information it's hard to nail down the root cause of this problem, but reading either of those articles must help a lot:

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • Hmm, there seems to be a markdown bug (link 2 isn't correctly parsed), but I can't seem to locate/fix it? – BalusC Apr 09 '10 at 14:28
  • 1
    @BalusC: Happens to me all the time (since I link to the Java6 docs a lot), you want to replace the space near the end with `%20`. – T.J. Crowder Apr 09 '10 at 14:30
  • @T.J. yes, that was it, thanks :) BTW: Firefox normally escapes them before pasting, but it didn't happen correctly for some odd reason. I re-created the link and the problem went away. – BalusC Apr 09 '10 at 14:31
  • @BalusC: I find very ironic that you point out a Joel article... His first article on Unicode was full of errors and misunderstanding: I remember him posting it and thinking "WTF!?". It was a "ah ah I got it" memorable moment from Joel, that was *full* of errors. It's actually since he posted his first article on Unicode that I started taking *everything* he ever said and keeps saying with a huge grain of salt ;) – SyntaxT3rr0r Apr 09 '10 at 15:11
  • @Wiz: That was also one of the reasons I wrote another one myself to clarify the one and other more, even in simple terms and with practical examples and solutions. But.. It are really not that *much* errors in Joel's article as you seem to insinuate? – BalusC Apr 09 '10 at 15:22
  • The only significant errors I see are (1) he says UTF-8 uses up to six bytes per character (which was true when he wrote the article, but was changed a month later), and (2) he implies that UTF-16 and UCS-2 are equivalent (which was never true). – Alan Moore Apr 10 '10 at 03:09
2

It is hard to answer the question without knowing more of the context.

In general you might have an encoding problem. See The Absolute Minimum Every Software Developer (...) Must Know About Unicode and Character Sets for an overview about character encodings.

Daniel Rikowski
  • 71,375
  • 57
  • 251
  • 329
2

Assuming that you want to remove all special characters, you can use the character class \p{Cntrl}. Then you only need to use the following code:

stringWithSpecialCharcters.replaceAll("\\p{Cntrl}", replacement);
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
ablaeul
  • 2,750
  • 20
  • 22
  • 1
    That works if you assume "special characters" means ASCII control characters. In my experience it usually means punctuation, but in this case it's anyone's guess. – Alan Moore May 04 '10 at 19:12
0

You can use Unicode escape sequences (such as \u201c [an opening curly quote]) to "avoid" characters that can't be directly used in your source file encoding (which defaults to the default encoding for your platform, but you can change it with the -encoding parameter to javac).

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • source file encoding defaults to the platform default encoding, i.e. usually not UTF-8. – Michael Borgwardt Apr 09 '10 at 14:32
  • @Michael: Thanks, fixed. I wasn't just inventing that, I wonder what language/environment it actually related to? ;-) Or was it true in 1996 or something... – T.J. Crowder Apr 09 '10 at 15:12
  • I doubt that, since UTF-8 wasn't specified until 1993, and Java instead used to have the recommendation to use native2ascii before distributing source code. I'd expect UTF-8 to be the default in some newer systems, though. – Michael Borgwardt Apr 09 '10 at 15:36
  • @Michael: 1993 is earlier than 1996, and I remember it being all nifty and cool that Java supported these weird Unicode things, so it's *possible*, though not likely. ;-) (`native2ascii`, crikey, that's a blast from the past) Thanks, though, the info pre-edit was clearly wrong in 2010 regardless! – T.J. Crowder Apr 09 '10 at 15:41