0

As far as I know char in java is built from 2 bytes, so why this test passes?

assertEquals(4, "test".getBytes(Charset.forName("UTF-8")).length);
piotrpo
  • 12,398
  • 7
  • 42
  • 58
  • 2
    See here: http://stackoverflow.com/questions/5078314/isnt-the-size-of-character-in-java-2-bytes – Andrei Olar Mar 23 '17 at 12:36
  • Because you ask for bytes in UTF-8 encoding, not for Java's `Char`s? – 9000 Mar 23 '17 at 12:36
  • 2
    You explicitly converted to UTF-8 encoding, so this is really not surprising. – harold Mar 23 '17 at 12:36
  • 1
    Here's some reading https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ – Kayaman Mar 23 '17 at 12:54

1 Answers1

3

In UTF-8, the char's in the range 0x00 to 0x7F are only 1 byte. Thus, it's 4 bytes long.

"test".getBytes(Charset.forName("UTF-16")

..would return 8 bytes.

EDIT: Added @Rossums comment for more detail.

Steve Smith
  • 2,244
  • 2
  • 18
  • 22