4

I've written a simple Java code snippet which takes a String, converts it to byte[], and then compresses it using Gzip. Then it decompresses the result to get back the byte[], which now contains one extra garbage value byte. Why is there a garbage value byte here ??

public static void main(String[] args) throws Exception {

String testString = "Sample String here";
byte[] originalBytes = testString.getBytes();

ByteArrayOutputStream baos = new ByteArrayOutputStream();
GZIPOutputStream gzos = new GZIPOutputStream(baos);
gzos.write(originalBytes);
gzos.close();

byte[] compressedBytes = baos.toByteArray();

ByteArrayInputStream bais = new ByteArrayInputStream(compressedBytes);
GZIPInputStream gzis = new GZIPInputStream(bais);

ByteArrayOutputStream dbaos = new ByteArrayOutputStream();
while(gzis.available() > 0) {
    dbaos.write(gzis.read());
}
byte[] decompressedBytes = dbaos.toByteArray();
String decompressedString = new String(decompressedBytes);

System.out.println(">>" + decompressedString + "<<");
System.out.println("Size of bytes before: " + originalBytes.length);
System.out.println("Size of bytes after: " + decompressedBytes.length);

}

Output:

>>Sample String here�<<
Size of bytes before: 18
Size of bytes after: 19

Can someone tell me why is there a garbage value byte ? How do I get rid of it WITHOUT changing the setup of the code above ??

Ahmad
  • 12,886
  • 30
  • 93
  • 146

1 Answers1

4

You are using available() here, so you get one extra byte. You should be reading the stream and checking for a value less than 0. Change this

ByteArrayOutputStream dbaos = new ByteArrayOutputStream();
while(gzis.available() > 0) {
    dbaos.write(gzis.read());
}

to something like

ByteArrayOutputStream dbaos = new ByteArrayOutputStream();
int b;
while ((b = gzis.read()) >= 0) {
    dbaos.write(b);
}

and I get

>>Sample String here<<
Size of bytes before: 18
Size of bytes after: 18
Elliott Frisch
  • 198,278
  • 20
  • 158
  • 249
  • What if the byte read is a legitimate negative value, which it can be in my actual case (String compression is not what I'm actually doing in my real code) ? – Ahmad Oct 04 '17 at 01:38
  • 1
    That is why `read` returns an `int`. This is how you read from a stream until the end. Your way adds an extra byte. – Elliott Frisch Oct 04 '17 at 01:44
  • You mean to say `read` will always return a non-negative integer until it reaches the end ? – Ahmad Oct 04 '17 at 01:51
  • Let's check [`InputStream.read()`](https://docs.oracle.com/javase/8/docs/api/java/io/InputStream.html#read--), it says (in part), *The value byte is returned as an `int` in the range `0` to `255`.* **and** it returns *the next `byte` of data, or `-1` if the end of the stream is reached.* – Elliott Frisch Oct 04 '17 at 01:55