0

I'm currently developing a marketing app on Android, which has a feature to send a URL via SMS. Because I'm using SMS, I want to make the text as short as possible, so it won't be split into parts.

The URL is generated dynamically by app. Different contact will result in different URL, as the app puts some "contact related information" to the URL. And this information is the one needs to be shortened, not the base URL.

I tried using Base64 to shorten the string, but it's not working.

Before Text: Myself|1234567890 Length: 17

After Text: TXlzZWxmfDEyMzQ1Njc4OTA= Length: 25

Then I tried Deflater, and the result is better than Base64, but still it's not shorten the string.

Before Text: Myself|1234567890 Length: 17

After Text: x��,N�I�1426153��4����3�� Length: 24

I've also tried GZIP, and the result is much worse than other method.

Before Text: Myself|1234567890 Length: 17

After Text: ����������������,N�I�1426153��4�����w�������� Length: 36

After comparing test results, I decided to use Base64 as it sometimes works, but I'm completely not satisfied. Can anyone give me a better approach?

EDIT:

I need this String Shortening to be executed OFFLINE, without internet connection. I'm terribly sorry for this sudden change, as our developer team decided so. Any idea?

Community
  • 1
  • 1
Harry
  • 568
  • 1
  • 5
  • 20
  • String Shortening Approach means u just want to short text in encoded form right ?? – Ando Masahashi Jan 16 '15 at 07:43
  • The last two approaches will have overhead, that will always be included in final outcome. However, these approaches are only useful when you compress large quantity of string data. For smaller quantities (like in your examples), you are better off without using them. – waqaslam Jan 16 '15 at 07:44
  • @Ando: Yes, a shorter, encoded version. Do you have any idea? – Harry Jan 16 '15 at 07:45
  • 3
    Why don't you use bit.ly to create shorten url. Refer its API dev.bitly.com – Rahul Sharma Jan 16 '15 at 07:48
  • @RahulSharma: I believe it's worth a try. Thanks for the suggestion! – Harry Jan 16 '15 at 07:51
  • 1
    You can also use Google Url sortner API https://developers.google.com/url-shortener/. http://stackoverflow.com/questions/18372672/how-do-i-use-the-google-url-shortener-api-on-android is an SO entry for the same. – Pankaj Kumar Jan 16 '15 at 09:21
  • Guys, I have edited my question. Now I need the String Shortening to be executed without internet connection. – Harry Jan 19 '15 at 04:44

1 Answers1

1

Base64 on it's own won't work because it typically increases the length of an encoded string by about 37%.

Deflater and GZIP both contain headers that will increase the length of short strings.

However, you can use Huffman coding or Arithmetic coding to take advantage of the fact that some characters are much more common in URLs than others. Generate a frequency table for your strings by generating a thousand of them or so and summing the occurrence of each character, and then generate a Huffman coding table based on these frequencies. You can then use this hard-coded table to encode and decode your strings: don't transmit the table along with the message.

Here is an interactive webpage that allows you to enter various strings and Huffman encode them: you can try it out with your URLs to get a general idea of what kind of compression rate you can expect, however in practice you will get a slightly lower compression rate if you are using the same table for all your strings. For your sample text "Myself|1234567890" the size of Huffman encoded string is 51% of the original.

After you generate your Huffman encoded string you might need to do another pass over it to escape any illegal characters that can't be transmitted in the SMS (or just Base64 encode your Huffman coded string), which might negate your savings from the Huffman encoding somewhat but hopefully you will still end up with a net saving.

If you get a 50% or so compression rate with Huffman coding and then Base64 encode the result (increasing the size again) you will still end up with a result around 30% smaller than the original.

samgak
  • 23,944
  • 4
  • 60
  • 82
  • I believe this is the answer I need. I will continue my research on Huffman coding. Thanks! – Harry Jan 19 '15 at 07:18