0

Recently I have read whats-the-difference-between-unicode-and-utf8 and related topics.

Now I am reading book and read following about Character streams:

Data dealt with is 16-bit Unicode characters.

As I understand "16-bit Unicode characters" means utf-16. If I don't set explicitly encoding then java uses default OS encoding but I can set encoding explisitly using InputStreamReader/OutputStreamWriter.

I don't understand something or phrase Data dealt with is 16-bit Unicode characters. about сharacter streams is incorrect ?

Community
  • 1
  • 1
gstackoverflow
  • 36,709
  • 117
  • 359
  • 710

3 Answers3

0

The only thing the computer knows are 1s and 0s. Specifying encoding is your way of telling the JVM how to "interpret" or "decrypt" the 1s and 0s. If you use UTF-16 encoding on an InputStream of UTF 8 data, your "decryption" algorithm will obviously not match the "encryption" algorithm, and you will get funny characters all over the place.

0

“Data dealt with is 16-bit Unicode characters” means that the data is 16−bit code units. Each unit is either a surrogate code unit or a represents a character in Basic Multilingual Plane (BMP), in the range U+0000 to U+FFFF. A surrogate code unit as such does not represent anything; only a pair of surrogate code units may have a meaning, denoting a character outside the BMP.

So the “characters” are not necessarily characters at all, even though we may refer to them as “Java characters” for example.

When you read, say, a UTF−8 encoded file, you should use routines that interpret the UTF-8 encoded data for you and yield 16-bit code units.

Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390
0

"16-bit Unicode character" is a redundant synonym for a Java char. A char is an unsigned 16-bit value, and as you have surmised, a sequence of chars is a UTF-16-encoded string.

The phrase "Data dealt with is 16-bit Unicode characters" refers to the fact that a Reader or Writer only reads or writes char values (or ints which hold char values).

Encoding is not a consideration when code uses a Reader or Writer, because it only works with char values. Encoding only matters when, as you have stated, you are creating a Reader or Writer that wraps an InputStream or OutputStream—specifically, when creating an InputStreamReader or OutputStreamWriter.

VGR
  • 40,506
  • 4
  • 48
  • 63