6

I want to shrink Strings like -1234B56789C;ABC1D3E/FGH4IJKL which are approx 20 - 25 case-insensitive chars.

My goal is to have an alphanumeric string that is a maximum of 16 characters. They must remain human readable.

Is that possible? Are there algorithms that can be used to compress alphanumeric string that also has some special chars?

It must also be possible to revert the compression.

membersound
  • 81,582
  • 193
  • 585
  • 1,120
  • 1
    I see a semicolon in your string. What other non-alphanumeric characters could you have? – La-comadreja Dec 03 '14 at 15:03
  • I don't know exactly yet, but definitely: `-;/` – membersound Dec 03 '14 at 15:04
  • 1
    can you use lowercase characters as well? or are characters interpreted case-insensitive somewhere in your work flow? – cello Dec 03 '14 at 15:15
  • Good point, my strings are fully case-insensitive! – membersound Dec 03 '14 at 15:21
  • If you've got 26*2 + 10 + 3 different values to represent, that's 65 characters. There are (excluding blank and the 0x7F character) 94 characters in the ASCII character set. I haven't done the arithmetic, but I think that might get you about 15% compression. It would be tight. – Hot Licks Dec 05 '14 at 22:31
  • Oh, I see you said "case-INsensitive". Then that's 39 characters, and you should be able to fold into the 94 ASCII characters and achieve about 25% compression. – Hot Licks Dec 05 '14 at 22:35
  • Except if the data is TRANSMITTED in a case-insensitive fashion (such that lower-case can't be used in the coding scheme) then we reduce the 94 characters down to 68, and you're back in the highly questionable range. – Hot Licks Dec 05 '14 at 22:37
  • Does this answer your question? [How to compress a String in Java?](https://stackoverflow.com/questions/3649485/how-to-compress-a-string-in-java) – fantaghirocco Nov 12 '20 at 16:28

2 Answers2

3

I think in general it's not possible unless you use a different target alphabet.
As far as I understand currently your source alphabet is 0-9 and A-Z.
If you extend your target alphabet to include also certain N>0 other chars,
then you can encode an input string with less characters that it originally had
(because e.g. you can encode pairs of chars from the source alphabet with
single chars from the target alphabet).

peter.petrov
  • 38,363
  • 16
  • 94
  • 159
2

You could attempt an LZW-like approach and look for common patterns in your input. For example - if you find that "1234" occurs often in your strings then you could encode that as "Q".

This approach cannot consistently achieve your requirements of a 16-character encoded string unless you can prove that the compression mappings you choose will always occur in the source with sufficient regularity to achieve a 16-character length.

OldCurmudgeon
  • 64,482
  • 16
  • 119
  • 213
  • That's a great idea. But: when decompressing the string again, how can I know if "Q" is a real alphanumeric block or a compressed decimal one? – membersound Dec 03 '14 at 15:20
  • @membersound - You must build a dictionary - anything in the dictionary is translated, anything not is passed through untouched. Have a look at how [LZW](http://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch) works. – OldCurmudgeon Dec 03 '14 at 15:27