0

I'm working on reading some utf-16 and ascii mixed files such as: \u6b64\u626b\u63cf: abc

And I will get "\\u6b64\\u626b\\u63cf: abc"(string length is 29) in java. How can I convert it into "\u6b64\u626b\u63cf: abc"(string length is 8)?

I know there is StringEscapeUtils in Apache Commons library, but I prefer not to use outer libraries.

Or, is it possible that I can read it directly to "\u6b64\u626b\u63cf: abc"?

Carpusher
  • 1
  • 2

1 Answers1

0

It seems not to have a general lib to use. But I found this private method in the Properties class of java, which totally solves my problem.

private String loadConvert (char[] in, int off, int len, char[] convtBuf) {
    if (convtBuf.length < len) {
        int newLen = len * 2;
        if (newLen < 0) {
            newLen = Integer.MAX_VALUE;
        }
        convtBuf = new char[newLen];
    }
    char aChar;
    char[] out = convtBuf;
    int outLen = 0;
    int end = off + len;
    while (off < end) {
        aChar = in[off++];
        if (aChar == '\\') {
            aChar = in[off++];
            if(aChar == 'u') {
                // Read the xxxx
                int value=0;
                for (int i=0; i<4; i++) {
                    aChar = in[off++];
                    switch (aChar) {
                      case '0': case '1': case '2': case '3': case '4':
                      case '5': case '6': case '7': case '8': case '9':
                         value = (value << 4) + aChar - '0';
                         break;
                      case 'a': case 'b': case 'c':
                      case 'd': case 'e': case 'f':
                         value = (value << 4) + 10 + aChar - 'a';
                         break;
                      case 'A': case 'B': case 'C':
                      case 'D': case 'E': case 'F':
                         value = (value << 4) + 10 + aChar - 'A';
                         break;
                      default:
                          throw new IllegalArgumentException(
                                       "Malformed \\uxxxx encoding.");
                    }
                 }
                out[outLen++] = (char)value;
            } else {
                if (aChar == 't') aChar = '\t';
                else if (aChar == 'r') aChar = '\r';
                else if (aChar == 'n') aChar = '\n';
                else if (aChar == 'f') aChar = '\f';
                out[outLen++] = aChar;
            }
        } else {
            out[outLen++] = aChar;
        }
    }
    return new String (out, 0, outLen);
}
Carpusher
  • 1
  • 2