0

I intend to send such a string (256bytes):

633a88d35a0f8fd172bd21158a03a8bb17ddc0acc6edb8ae19a9dbd1aa855b75319e540910fb70cf7bb51d608219dd4b387623f94262705a9c2c19332240e2a6d696d4cb896abf0101afae1aeebf3d6299675e0e67904e7a544de9e3e65fb9def9b0b047fb57a0b742226d602d386d9e2fe176a88837eddd0c77d6911d386c2e

via SMS through android, and the content should be within 1 message.

As you may know, the SMS has a limit of 160 bytes per message, I have tried using gzip in Java and then encode the compressed stuff with Base 64, but the compression ratio is not quite good.

Since the compressed data will be sent via SMS, there should be a encoding method to make the compressed string "transmittable".

Any ideas?

Thank you for any comments/answers!

Nikola Despotoski
  • 49,966
  • 15
  • 119
  • 148
dumbfingers
  • 7,001
  • 5
  • 54
  • 80

5 Answers5

4

If you convert to binary, you go from 256 hex digits to 128 bytes. Then use (or modify) one of the techniques mentioned in this thread to convert to an acceptable character set for SMS. (That thread deals with targeting JSON, but the same ideas can be applied to SMS.)

Community
  • 1
  • 1
Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
1

You could use ascii85 (the ASCII85 version used by PostScript) because that compacts any zero-byte sequences too. Here is the transformation in a Python shell:

>>> a = b'633a88d35a0f8fd172bd21158a03a8bb17ddc0acc6edb8ae19a9dbd1aa855b75319e540910fb70cf7bb51d608219dd4b387623f94262705a9c2c19332240e2a6d696d4cb896abf0101afae1aeebf3d6299675e0e67904e7a544de9e3e65fb9def9b0b047fb57a0b742226d602d386d9e2fe176a88837eddd0c77d6911d386c2e'

>>> ascii85_encoded = base85_encode(hex_decode(a))
>>> repr(ascii85_encoded) 
b'@lfFp=q?\\AEkNV2M?Bfh(Yum.`pL:=)6)B<WeFZ"0qM>N&GpFmHaOl%Jf3B;3-HPB6=On;S1GO6,!b.bes=h/M/\'d+!O&XEm_:noR:fh9B95l7<))W;k$P[Uq67(nqcBH"66^8S/N@U=0B%)QLc=_W%!U9b*B7jf' 

>>> len(ascii85_encoded)
160

Now the above code is in Python based on:

https://code.google.com/p/python-mom/source/browse/mom/codec/base85.py

You may want to port it to Java for your needs.

HTH.

yesudeep
  • 325
  • 4
  • 13
0

You can't quite do it. The reason is that MD5-like data maximized entropy, and so gzip and friends will have a hard time getting close to 50% efficiency, and even if they did, it would be hit or miss.

The optimal 2:1 compression is: Treat every 2 chars as a byte in hex, and convert it into a binary char. That will cut the size down to 1/2. However, the binary data can't be sent, so you have to base64 encode it, leading to 33% increase. That leaves you at ~170 chars. "Base-128" encoding won't help, since there aren't 128 chars that are certain to transmit.

In short, you need to cut the data down. After all, the easiest way to send less data is to have less data :)

Sajid
  • 4,381
  • 20
  • 14
  • Using base85 instead of base64 will save ~7% which should be just enough to squeak by. – Matt Ball Jul 28 '11 at 03:26
  • @Matt - Theoretically true, but non-binary-boundary encoding is a real pain, and then you have to end up treating the entire mess as a number -- eek! Plus, since we have to eventually express it as chars in a binary form, that will likely induce inefficiencies. – Sajid Jul 28 '11 at 03:28
  • base85 squeezes every 4 bytes into 5 characters. So you're essentially just converting back every 5 characters into a single 32-bit value. It's not that complex - certainly less so than gzip. – thomasrutter Jul 28 '11 at 03:35
  • @thomas: Good point. Somehow I was thinking of it over-complicatedly :) – Sajid Jul 28 '11 at 03:41
  • Thanks @Sajid , I just revealed the data may contained a lot of entropy as it is nearly random data... – dumbfingers Jul 28 '11 at 17:02
0

It really depends on the exact type of data you are trying to send.

If there are predictable patterns in your data you can probably use http://en.wikipedia.org/wiki/Huffman_coding with a pre-defined alphabet of symbols to bring your size down.

xordon
  • 5,523
  • 4
  • 18
  • 26
  • As @Sajid points out, with MD5-like data, it's going to be hard (and hit-or-miss) to get 50% efficiency. – Matt Ball Jul 28 '11 at 03:32
  • Well, seemed the data i'm trying to send isn't contain much redundant information. Which means, according to the Information Theory, it has a lot of information entropy. – dumbfingers Jul 28 '11 at 17:00
0

That string is hex-encoded. Therefore it's using 200% of the space of the binary message.

If you used base64 encoding instead, it would use 134% which is 171 characters. Still a bit too much.

Base85, which was invented by a relative of mine, could do it. It would use exactly 160 characters.

thomasrutter
  • 114,488
  • 30
  • 148
  • 167