0

I hope I'm asking the right question, but if not I'd like to clarify what I'm trying to do.

I'd like to take a string of alphabetic text and convert it into a 32-character alphanumeric string of text. I would then like to covert that 32-character alphanumeric string of text back to the original text. I only care about shortening the length. I don't care about security. I don't care about maintaining the case sensitivity of the original string.

Is something like this even possible or am I trying to create bytes of data out of thin air?

  • Can you give an example of the input, and the expected 32 character string output? What have you tried? – BruceWayne Apr 27 '17 at 04:17
  • There are ways to compress alphabetic text, but there are limits to how much compression is possible. This depends on how long the alphabetic text is (how many total characters, including spaces) and things like: Will there be both upper case and lower case characters? Will there be punctuation? Are we talking English characters, or some other language, or multiple languages? Please be a bit more specific on what you want to accomplish. – Rich Holton Apr 27 '17 at 04:36
  • 1
    http://stackoverflow.com/questions/1138345/an-efficient-compression-algorithm-for-short-text-strings – Tim Williams Apr 27 '17 at 05:13
  • @BruceWayne An example: the cow jumped really really high so really really high becomes something like Le893821HekEKAizPqlalzEEl3901pzo. – noknowno Apr 27 '17 at 14:08
  • @RichHolton Sorry. In the input there are a total of 28 potential characters (case insensitive a-z,space,and dash). The output would have a potential total of 62 characters (base64 minus / and +). – noknowno Apr 27 '17 at 14:13

1 Answers1

0

In base 62 (62 different characters), a string of 32 characters can represent 62^32 different values, which comes out to about 3.7922554e+57.

An arbitrary string in your origin alphabet of 28 characters (A-Z plus space and period) of length 39 could be any of 2.7489271e+56 different patters. If the length is increased to 40 characters, there are 7.6969959e+57 different patterns.

What this means: The theoretical maximum length arbitrary string you could store is 39 characters. In practice, it would almost certainly be less than that. Note that the example string you provided in the comments is 55 characters long.

This is for an arbitrary string. Compression algorithms often look for patterns that can be compressed further. But that depends on the nature of the string being compressed.

Rich Holton
  • 662
  • 5
  • 12