I am using GZIPOutputStream
in my java program to compress big strings, and finally storing it in database.
I can see that while compressing English text, I am achieving 1/4 to 1/10 compression ration (depending on the string value). So say for example my original English text is 100kb, then on an average compressed text will be somewhere around 30kb.
But when I am compressing unicode characters, the compressed string is actually occupying more bytes than the original string. Say for example, my original unicode string is 100kb, then the compressed version is coming out to 200kb.
Unicode string example: "嗨,这是,短信计数测试持续for.Hi这是短"
Can anyone suggest that how can I achieve compression for unicode text as well? and why the compressed version is actually bigger than the original version?
My compression code in Java:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
GZIPOutputStream zos = new GZIPOutputStream(baos);
zos.write(text.getBytes("UTF-8"));
zos.finish();
zos.flush();
byte[] udpBuffer = baos.toByteArray();