1

I'm reading a blob from a MySql database using JDBC. I know the resulting byte array is good, I've sent it over HTTP as a string literal of numbers for each byte, and successfully downloaded the result (jpg). (just to prove mysql -> java servlet data is good).

Constructing a new string from this byte array using UTF-8 yields a string shorter in length than the byte array, and of values I can't decipher. If UTF-8 is AT LEAST 1 byte per character, shouldn't the resulting string be AT A MINIMUM the length of the byte array its generated from? (for this particular example, byte length is 12,079,474 and resulting string length is 11,501,845)

Thanks for your time!

DWR
  • 888
  • 1
  • 8
  • 15
  • 2
    You are contradicting your own statement. if multiple bytes make a char, then char length will be lesser, right? Also, look [here](http://stackoverflow.com/questions/16270994/difference-between-string-length-and-string-getbytes-length) – Gurwinder Singh Dec 31 '16 at 16:53
  • 3
    A .jpg is not text, it is binary data. it makes no sense to try to interpret the bytes of a jpg image as a string. – nos Dec 31 '16 at 16:56
  • If you need the binary data as a string, consider converting each byte to hex or similar to have a bidirectional operation – Bohemian Dec 31 '16 at 17:18
  • Oh, thanks! Ya'll pushed me in the right direction. I want a string of each byte interpreted as its unicode CODEPOINT, not a string of the byte array interpreted as a UTF-8 literal. – DWR Dec 31 '16 at 17:26
  • But if the bye array contains binary data, like a jpg image, there won't be any codepoints to extract in the first place. You cannot treat binary data as if it were text, you can only treat text as text. – Remy Lebeau Jan 04 '17 at 03:03

1 Answers1

0

In your bytes, you have data that is interpreted as continuation bytes, i.e. in UTF-8 they have special meaning and they form one Unicode character from multiple bytes. That is why your string is shorter than the number of bytes.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
Marc Balmer
  • 1,780
  • 1
  • 11
  • 18