1

I have a string which contains alphabets, numbers and special characters, that can roughly go up to a maximum of 300 characters in length.

I'd like to compress or encode or compact(I don't know which is the correct process to be used) so that the final string should be half(can be anything less than that as well) of the original string length.

So this human non-understandable string can be sent via any mechanism to the recipient and he should be able to decode this at his end to get the original string.

Please provide some pointers on how I should be implementing this.

I have some understanding on Huffman coding, but it needs the symbol table also to be sent.

I have looked at base-64 (don;t know if I understood it correctly) but it is increasing the string length.

All comments and pointers welcome.

I have looked at StackOverflow Qs-1,

Thanks,
Sen

Community
  • 1
  • 1
Navaneeth Sen
  • 6,315
  • 11
  • 53
  • 82

1 Answers1

0

The only way to assure that you can compress by a factor of two is to throw away about half of the 300 characters.

If you can limit the number of possible characters, then you can compress by a factor of log(n)/log(256), where n is the that number. E.g., if you can limit it to 85 characters, i.e. 52 alpha, ten numeric, and 23 special characters (including spaces, new line markers, etc.), then you could get a factor of 0.8.

You can try various conventional compression methods such as zlib, but you won't get far with only 300 characters. zlib does Huffman coding as well as making use of matching strings in the history. Generally much more history than 300 characters is needed before such approaches can give you much gain. If you have a sequence of 100 or 1000 such 300 character messages, then group them and compress them together. Then you might see some real gain.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158