0

I'm not able to trim the unicode control character \u0085 in Java. How can you do this?

String str = "\u0000\u001f\u0085 hi \n"
PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.println(teststr);
String st = teststr.replaceAll("\\p{Cntrl}", "");
out.println(st);

The character \u0085 gets printed as ? and doesn't seem to get replaced.

imulsion
  • 8,820
  • 20
  • 54
  • 84
user1101293
  • 49
  • 1
  • 7
  • 1
    related: http://stackoverflow.com/questions/6198986/how-can-i-replace-non-printable-unicode-characters-in-jav – Nicktar May 07 '13 at 09:46

1 Answers1

1
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;

    public static String trimUtf16(String test) {
        Pattern unicode = Pattern.compile("[^\\x00-\\x7F]",
                Pattern.UNICODE_CASE | Pattern.CANON_EQ
                        | Pattern.CASE_INSENSITIVE);
        Matcher matcher = unicode.matcher(test);
        test = matcher.replaceAll(" ");
        return test;
    }
    System.out.println(trimUtf16("\u0000\u001f\u0085 hi \n"));// hi 
flavian
  • 28,161
  • 11
  • 65
  • 105
  • thanks! this worked... i dont think i understand the code though. I have an input stream which is in utf-8 and i want to trim off '\u0085' char from the input stream... I was testing using a string to see if i'm able to replace this char in a string. – user1101293 May 07 '13 at 10:01
  • i get it.. \u0085 is a utf16 char... the utf8 char for U+0085 is 0xC2 0x85... Thanks! your response was very useful. – user1101293 May 07 '13 at 10:14
  • I have an input stream that contains valid utf8 characters 0xC2 0x85 (U+0095). How can I read this correctly in java? using byte array doesn't help i think as 0x85 is out of range. Basically, I need to read utf8 characters coming from a socket in java which contains 0xC2 0x85. – user1101293 May 07 '13 at 15:45