TLDR
Java uses two characters to represent UTF-16. Using Arrays.sort (unstable sort) messes with character sequencing. Should I convert char[] to int[] or is there a better way?
Details
Java represents a character as UTF-16. But the Character
class itself wraps char
(16 bit). For UTF-16, it will be an array of two char
s (32 bit).
Sorting a string of UTF-16 characters using the inbuilt sort messes with data. (Arrays.sort uses dual pivot quick sort and Collections.sort uses Arrays.sort to do the heavy lifting.)
To be specific, do you convert char[] to int[] or is there a better way to sort?
import java.util.Arrays;
public class Main {
public static void main(String[] args) {
int[] utfCodes = {128513, 128531, 128557};
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);
char[] chars = emojis.toCharArray();
Arrays.sort(chars);
System.out.println("Sorted String: " + new String(chars));
}
}
Output:
Initial String:
Sorted String: ????