2

I was trying to see the UTF-8 bytes of in both Java and Javascript.

In Javascript,

new TextEncoder().encode(""); returns => [240, 159, 145, 141]

while in Java,

"".getBytes("UTF-8") returns => [-16, -97, -111, -115]

I converted those byte arrays to hex string using methods I found corresponding to the language (JS, Java) and both returned F09F918D

In fact, -16 & 0xFF gives => 240

I am curious to know more on why both language chooses different ways of representing byte arrays. It took me a while to figure out up to this.

Community
  • 1
  • 1
Vigneshwaran
  • 3,265
  • 6
  • 23
  • 36

1 Answers1

4

In Java all bytes are signed. Therefore, the range of one byte is from -128 to 127. In Javascript though, the returned values are, well, simply speaking integers. So it can be represented in decimal using the full range up to 255.

Therefore, if you convert both result to 1 byte hexadecimal representation - those would be the same: F0 9F 91 8D.

Speaking of why java decided to eliminate unsigned types, that is a separate discussion.

Community
  • 1
  • 1
bezmax
  • 25,562
  • 10
  • 53
  • 84