2

I am trying to parse an ascii list to a string. The problem is that with some special chars, I have torubles. If I try to parse this:

115 097 116 195 168 108 194 183 108 105 116

, the result sould be "satèl·lit". The code I am using to parse it is :

ASCIIList.add(Character.toString((char) Integer.parseInt(asciiValue)));

But the result is satèl·lit. I saw that for example "è" -> "195 168". I do not know how to parse it correctly.

amorotez
  • 33
  • 4
  • 3
    Possible duplicate of [How to convert ASCII code (0-255) to a String of the associated character?](https://stackoverflow.com/questions/7693994/how-to-convert-ascii-code-0-255-to-a-string-of-the-associated-character) – Σωτήρης Ραφαήλ Nov 08 '18 at 15:25
  • 3
    It's not ASCII. It's obviously UTF-8. – Codo Nov 08 '18 at 15:28
  • @amorotez. There is no text but encoded text. In addition to the byte values, you have to know the character encoding. Be very skeptical when someone says ASCII. It's very likely not ASCII. (Also, your code was assuming ISO 8859-1, anyway.) – Tom Blodget Nov 09 '18 at 03:18

1 Answers1

5

Assuming you already have split the input into an array of string, the code could look like so:

String convertToString(String[] numberArray) {
    byte[] utf8Bytes = new byte[numberArray.length];
    for (int i = 0; i < numberArray.length; i++) {
        utf8Bytes[i] = (byte) Integer.parseInt(numberArray[i]);
    }
    return new String(utf8Bytes, StandardCharsets.UTF_8);
}

So each number becomes a bytes. The entire array of bytes is then converted into a string using UTF-8 charset.

UTF-8 uses multiple bytes to represent characters outside the ASCII range. In your example it affects "è" and "·".

Codo
  • 75,595
  • 17
  • 168
  • 206