For 2 days now, I've been searching for ways to check if a value from the database is utf-8 encoded or not in Java. So far, I've read that strings in Java are using unicode (utf-16) encoding. I've tried following the suggested answer from here and here but neither seem to work properly. The first one always returns false while the second one would always return true.
An example of strings I try to check are as follows wherein everything except the last string is utf8 encoded:
ABCDEF, katakana, カタカナ and �K�{�`�F�b�N�G���[
One idea that I've been trying is to get the bytes of the string using utf-8 encoding then also get the bytes of the string using the default encoding then compare like so:
byte[] utf8byte = str.getBytes("UTF-8");
byte[] bytes = str.getBytes();
if(utf8byte.length == bytes.length) {
return true;
}
However given this logic, only the first string would return true. From my understanding, this is because not all characters use only 1 byte.
So what is the best approach you can suggest to check whether a string from the database is utf-8 encoded or not? I'd really appreciate any idea. Thanks in advanced.