1

I have the following code:

byte[] b = new byte[len]; //len is preset to 157004 in this example
//fill b with data by reading from a socket
String pkt = new String(b);
System.out.println(b.length + " " + pkt.length());

This prints out two different values on Ubuntu; 157004 and 147549, but the same values on OS X. This string is actually an image being transmitted by the ImageIO library. Thus, on OS X I am able to decode the string into an image just fine, but on Ubuntu I am not able to.

I am using version 1.6.0_45 on OS X, and tried the same version on Ubuntu, in addition to Oracle jdk 7 and the default openjdk.

I noticed that I can get the string length to equal the byte array length by decoding with Latin-1:

String pkt = new String(b,"ISO-8859-1");

However this does not make it possible to decode the image, and understanding what's going on can be difficult as the string looks like garbage to me.

I'm perplexed by the fact that I'm using the same jdk version, but a different OS.

codersarepeople
  • 1,971
  • 3
  • 20
  • 25

2 Answers2

7

This string is actually an image being transmitted by the ImageIO library.

And that's where you're going wrong.

An image is not text data - it's binary data. If you really need to encode it in a string, you should use base64. Personally I like the public domain base64 encoder/decoder at iharder.net.

This isn't just true for images - it's true for all binary data which isn't known to be text in a particular encoding... whether that's sound, movies, Word documents, encrypted data etc. Never just treat it as if it were just encoded text - it's a recipe for disaster.

Community
  • 1
  • 1
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • So when I write it, I use ImageIO.write(ByteArrayOutputStream), then write that to a ByteBuffer. So I simply need to decode in base64, correct? – codersarepeople Jul 17 '13 at 14:59
  • @codersarepeople: No, you'd convert the `byte[]` into a string directly with base64... that's the *encoding*. Then when the string is recovered on the other side, you would *decode* it from base64 to a `byte[]` again, and then wrap that in a `ByteArrayInputStream`. – Jon Skeet Jul 17 '13 at 15:03
0

Ubuntu uses utf-8 by default, which is a variable length encoding so the lengths of the string and byte data differ. This is the source of the difference, but for the solution I defer to Jon's answer.

Sotirios Delimanolis
  • 274,122
  • 60
  • 696
  • 724
kiheru
  • 6,588
  • 25
  • 31