1

Given a random integer, for example, 19357982357627685397198. How can I compress these numbers into a string of text that has fewer characters?

The string of text must only contain numbers or alphabetical characters, both uppercase and lowercase.

I've tried Base64 and Huffman-coding that claim to compress, but none of them makes the string shorter when writing on a keyboard.

I also tried to make some kind of algorithm that tries to divide the integer by the numbers "2,3,...,10" and check if the last number in the result is the number it was divided by (looks for 0 in case of division by 10). So, when decrypting, you would just multiply the number by the last number in the integer. But that does not work because in some cases you can't divide by anything and the number would stay the same, and when it would be decrypted, it would just multiply it into a larger number than you started with.

I also tried to divide the integer into blocks of 2 numbers starting from left and giving a letter to them (a=1, b=2, o=15), and when it would get to z it would just roll back to a. This did not work because when it was decrypted, it would not know how many times the number rolled over z and therefore be a much smaller number than in the start.

I also tried some other common encryption strategies. For example Base32, Ascii85, Bifid Cipher, Baudot Code, and some others I can not remember.

It seems like an unsolvable problem. But because it starts with an integer, each number can contain 10 different combinations. While in the alphabet, letters can contain 26 different combinations. This makes it so that you can store more data in 5 alphabetical letters, than in a 5 digit integer. So it is possible to store more data in a string of characters than in an integer in mathematical means, but I just can't find anyone who has ever done it.

Hamligt
  • 63
  • 6
  • 1
    @Hmligt I am no expert in encoding, so I may very well be wrong, but it would be hard to compress without losing information if you don't accept a bigger alphabet to use. As far as I see, the best you can do is use a [Base36](https://en.wikipedia.org/wiki/Base36) encoding (digits + 26 english uppercase letters). Also, I don't see how you could use a Base64 encoding with your constraints, as it uses both upper and lowercase letters, digits, + and / – Sandro Massa Jun 01 '21 at 15:52
  • Compressing a single integer is not going to gain much, and if they are truly random then lossless compression algorithms can't help. As @SandroMassa points out, base 36 is optimal within your constraints. Since log2(10)=3.32 and log2(36)=5.17 you have about 1.85 bits of "head room" per character. That means you can encode something like 6 digits in 4 base 36 characters. That's not much, and I would doubt that's enough to bother at all. – President James K. Polk Jun 01 '21 at 18:39
  • 1
    There are 10 digit 0,1,2,4,5,6,7,8,and 9 for base 10. Then you can represent each digit with 4 bits. Easily we can use the hex values. Therefore this encoding reduced the size into half bits, instead of 8-bit ASCII. Base36 will slightly increase the size since 6 bits makes 64, that have number larger than 36. You base64, that is better. – kelalaka Jun 01 '21 at 19:07
  • @SandroMassa I tried Base64 and quickly noticed that it used other symbols than wanted. But I noticed that lowercase letters can be used too, so I will correct the question. If I have lowercase letters in it, it becomes 62 characters. This would mean I need some sort of Base62 algorithm. But there is a problem with this. I need the string to use fewer characters, and I do not need less binary data. If I for example use the 10 character number "6846532136", and then uses Base62, I get "1HMTPz1AZuAu2A", which has 14 characters. This seems to be the issue with all different Base algorithms. – Hamligt Jun 01 '21 at 20:16

2 Answers2

6

You switch from base 10 to eg. base 62 by repeatedly dividing by 62 and record the remainders from each step like this:

Converting 6846532136 to base62: 

   Operation         Result   Remainder
6846532136 / 62    110427937     42
 110427937 / 62      1781095     47
   1781095 / 62        28727     21
     28727 / 62          463     21
       463 / 62            7     29
         7 / 62            0      7

Then you use the remainder as index in to a base62 alphabet of your choice eg:

0         1         2         3         4         5         6
01234567890123456789012345678901234567890123456789012345678901
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789

Giving:  H (7) d (29) V (21) V (21) v (47) q (42)   = HdVVvq
                                                      ------
Ebbe M. Pedersen
  • 7,250
  • 3
  • 27
  • 47
  • An excellent answer, though with my nit-picking hat on it does not actually compress single digit numbers. 0..9 are expressed as A..J; single digit to single digit. Anything from 10 upwards is indeed compressed. – rossum Jun 02 '21 at 20:08
  • The numbers I need to compress are really high, and this method works amazing for large numbers. ("10349457109824510924501932690127436509134650913640913409163094109563245091632450916340596134095613094865093465092346091364") – Hamligt Jun 03 '21 at 04:56
-1

It's called base10 to base62, there bunch of solutions and code on the internet.

Here is my favorite version: Base 62 conversion

oraant
  • 309
  • 3
  • 9