0

Possible Duplicate:
Best compression algorithm for short text strings

i need help in compressing and decompress the string.

when i try to compress smaller string it convert into more byte then the original size. but when i add bigger string it compress in lesser bytes.

i am giving my code below:

package string_compress;

import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;



 //@author Administrator

   public class Main
  {

   public static String compress(String str) throws IOException {
     if (str == null || str.length() == 0) {
        return str;
    }
    System.out.println("String length : " + str.length());
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    GZIPOutputStream gzip = new GZIPOutputStream(out);
     gzip.write(str.getBytes());

    gzip.close();

   String outStr = out.toString("ISO-8859-1");//ISO-8859-1
    System.out.println("Output String lenght : " + outStr.length());

    return outStr;
   }

   public static String decompress(String str) throws IOException {
    if (str == null || str.length() == 0) {
        return str;
    }
    System.out.println("Input String length : " + str.length());
    GZIPInputStream gis = new GZIPInputStream(new   ByteArrayInputStream(str.getBytes("ISO-8859-1")));
    BufferedReader bf = new BufferedReader(new InputStreamReader(gis, "ISO-8859-1"));
    String outStr = "";
    String line;
    while ((line=bf.readLine())!=null) {
      outStr += line;
    }
    System.out.println("Output String lenght : " + outStr.length());
    return outStr;
 }
  /**
  * @param args the command line arguments
  */
 public static void main(String[] args)throws IOException {


        //String filePath = ".\response.txt";

 //   String string = getFileData(filePath);
    String string= "rishi jain is tring to compress the string";

     System.out.println("after compress:");
    String compressed = Main.compress(string);
    System.out.println(compressed);
    System.out.println("after decompress:");
    String decomp = decompress(compressed);
    System.out.println(decomp);

  }


    }
Community
  • 1
  • 1
user1990643
  • 49
  • 2
  • 5
  • It works as expected. The compression algorithm and packed data have some overhead. The output file needs to contain a header that always has to be there, irrespective of input data size. That's why the output for small strings is larger than input. – Adam Dyga Jan 31 '13 at 13:27
  • @rae1n right, removed @ then – Adam Dyga Jan 31 '13 at 13:30
  • @AdamDyga Right. But you asked. – rae1 Jan 31 '13 at 13:31

1 Answers1

2

Do not compress the short strings as GZIP only works above the certain size of the input, probably 18 or more, see below. Put the length threshold or discard the compressed version if it is longer than uncompressed.

At the time you need to uncompress, look for the GZIP header magic sequence, (0x1f, 0x8b) at the start of the string. If such is not present, the string is not compressed and should be returned "as is".

A string that starts from this magic sequence just by chance must be compressed independently from its size (should be rare as both bytes are not printable ASCII symbols).

Of course, the first byte after the magic sequence specifies the format, and there is an option "stored" (uncompressed). However this may not be good enough if you have a lot of strings that are just empty or really short, as gzip has a 10 byte header and an 8 byte footer.

Audrius Meškauskas
  • 20,936
  • 12
  • 75
  • 93