1

Before any claim, I've checked out:
Determine if characters in a string are all of a specific character set

...and tried the selected answer (with UTF-8):
StandardCharsets.UTF_8.newEncoder().canEncode(input);

With my input which is a String

I also tried CharsetDecoder without any valuable result (it always gets acknowledged as valid)

Alan
  • 589
  • 5
  • 29
  • Check this link, it might help - https://stackoverflow.com/questions/6622226/check-if-a-string-is-valid-utf-8-encoded-in-java – JavaLearner1 Aug 08 '19 at 10:19
  • As far as I know, the expression `StandardCharsets.UTF_8.newEncoder().canEncode(input);` is equivalent to `true` (i.e. any java `String` can be encoded in UTF-8) – Eran Aug 08 '19 at 10:21
  • Do you have examples of input Strings? – Bentaye Aug 08 '19 at 10:22
  • @Bentaye yes I tried with this one `` – Alan Aug 08 '19 at 10:24
  • @Eran Could be, I have seen some examples where a ' UnsupportedEncodingException` may be thrown – Alan Aug 08 '19 at 10:26
  • @Alan which encoding was used in those examples, and which characters failed to be encoded? – Eran Aug 08 '19 at 10:31

1 Answers1

2

A Java String is in UTF-16 format:

A String represents a string in the UTF-16 format in which supplementary characters are represented by surrogate pairs (see the section Unicode Character Representations in the Character class for more information). Index values refer to char code units, so a supplementary character uses two positions in a String.

UTF-16 is:

UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode.

UTF-8 is:

UTF-8 is a variable width character encoding capable of encoding all 1,112,064 valid code points in Unicode using one to four 8-bit bytes.

It follows that every character that can appear in a java String can be encoded in UTF-8.

Therefore

StandardCharsets.UTF_8.newEncoder().canEncode(input);

should always return true.

Eran
  • 387,369
  • 54
  • 702
  • 768