4

I have a string of numbers that i would like to make shorter for use in a URL. This string always consists of numbers only. For example: 9587661771112

In theory, encrypting a numeric string into an alphanumeric(0-9a-zA-Z) string should always return a shorter result, which is what i want.

I've created an algorithm that does the following:

Encrypt ( string1 = numeric input string, string2 = alphanumeric return string)

  • Takes the next two characters from string1 and converts them into a number, e.g 95 for the above example
  • Checks if the number is less than 52(the combined length of a-z and A-Z)
    • if so, adds ("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")[Number] to string2 and jump forward by 2 characters
    • else, add ("0123456789)[First digit of Number) to string2 and jump forward by 1 character

In the next step the number would be 58 and so on.

With some tweaking the shortest result i could get was: 9587661771112 > j9UQpjva

My problem is that with this technique, the result can vary dramaticaly. I also feel that this is not a clean solution to my problem.

So I need an encryption algorithm that converts a string of numbers into a shorter string of uppercase letters, lowercase letters and numbers. It has to be decryptable and have a more or less consistent result.

Any idea how to achieve this?


Solution:

string Chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";

string Base10To62(long N)
{
    string R = "";
    while (N != 0)
    {
        R += Chars[(int)(N % 62)];
        N /= 62;
    }
    return R;
}

long Base62To10(string N)
{
    long R = 0;
    int L = N.Length;
    for (int i = 0; i < L; i++)
    {
        R += Chars.IndexOf(N[i]) * (long)Math.Pow(62, i);
    }
    return R;
}

works like a charm :)

Maarten Bodewes
  • 90,524
  • 13
  • 150
  • 263
Oht
  • 409
  • 5
  • 19
  • 2
    I think the correct term is __compressing__ - not __encrypting__. – Martin Liversage Aug 01 '12 at 07:44
  • You could call it compressing indeed, but my intention is also to more or less protect the string against editing! – Oht Aug 01 '12 at 07:45
  • 1
    You simply need to convert a number from base 10 to base 52 (see http://stackoverflow.com/questions/923771/quickest-way-to-convert-a-base-10-number-to-any-base-in-net). Forget about "encryption" here -- "encrypting" and "shorter string" do not go together. – Jon Aug 01 '12 at 07:45
  • The only way to be guaranteed to have a shorter code is to code it that way. If that's what you intend and want then yes, but I could take every digit individually, without shorter being a requirement, and come up with one the same length! – ChiefTwoPencils Aug 01 '12 at 07:46
  • @Jon looks promising, ill look into that! – Oht Aug 01 '12 at 07:53
  • @Jon, thanks man, base 52 worked. and i did get a shorter string! – Oht Aug 01 '12 at 08:20
  • 1
    Hi Thomas - if you have a solution then the best thing to do is to post an answer to your own question - that way its clearer what the question is vs the solution is and other people can upvote your answer :) – Justin Aug 01 '12 at 10:10
  • Do you want actual encryption (i.e. a keyed transformation), or is an encoding that converts between bases enough? These are orthogonal concerns. – CodesInChaos Aug 01 '12 at 10:11
  • You may want to pre-compute those powers of 62 though, it's a bit wasteful this way. – Maarten Bodewes Aug 01 '12 at 19:02
  • Your solution works only if the string of digits fits in a long. You did not specify the number of digits. 18 digits will fit in a 64-bit integer (19 into an unsigned 64-bit integer), or 9 digits into a 32-bit integer. – Mark Adler Aug 02 '12 at 00:26
  • 1
    Bugs: 0 results in an empty string for the output of Base10To62; Leading zeros are lost, so not all digit strings can be encoded. – Mark Adler Aug 02 '12 at 06:09

3 Answers3

2

Solution:

string Chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";

    private static string Base10To62(string S) 
    {
        string R = "";
        var N = long.Parse(S);
        do { R += Chars[(int)(N % 0x3E)]; } while ((N /= 0x3E) != 0);
        return R;
    }

    private static string Base62To10(string S) 
    {
        long R = 0;
        int L = S.Length;
        for (int i = 0; i < L; i++) R += Chars.IndexOf(S[i]) * (long)(System.Math.Pow(0x3E, i));
        return R.ToString();
    }
Oht
  • 409
  • 5
  • 19
1

Linq version for 62 to 10, just for fun:

long Base62To10(string N)
{
    return N.Select((t, i) => Chars.IndexOf(t)*(long) Math.Pow(62, i)).Sum();
}
M. Mennan Kara
  • 10,072
  • 2
  • 35
  • 39
1

If you can add two more characters to your set to make it a nice even 64, then there is a simple, fast algorithm I can describe here.

Encode the digits into a three or four-bit code as follows:

0: 000
1: 001
2: 010
3: 011
4: 100
5: 101
6: 1100
7: 1101
8: 1110
9: 1111

This is a prefix code, which means that you can look at the first three bits to tell if you need to use a fourth. If the first three bits as an integer are greater than 5, then get another bit. So the decoding would be:

get three bits as n
if n < 6
     the result is n + '0'
else
     n = (n << 1) + one more bit
     the result is n - 6 + '0'

The bits are then simply stored six at a time in one of the 64 allowed characters.

This has a problem if you do not know a priori how many digits there are, since there will be ambiguity if you leave four or five bits unused in the last character. In that case, the code can be changed simply to:

0: 000
1: 001
2: 010
3: 011
4: 100
5: 1010
6: 1011
7: 1100
8: 1101
9: 1110
eom: 1111

which takes a few more bits, but provides an unambiguous end-of-message marker.

For the first example, you will store on average 1.76 digits per character. For the second example, 1.71 digits per character, less some amount for the eom marker depending on the number of digits you encode at one time.

If you really can only use 62 characters, then I'll need to think about it a little more.

Update:

A quick look at RFC 1738 indicates that many more characters can be used in a URL:

lowalpha       = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" |
                 "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
                 "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
                 "y" | "z"
hialpha        = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
                 "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
                 "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
alpha          = lowalpha | hialpha
digit          = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
                 "8" | "9"
safe           = "$" | "-" | "_" | "." | "+"
extra          = "!" | "*" | "'" | "(" | ")" | ","
unreserved     = alpha | digit | safe | extra

So adding, say, $ and _ to your set would make it 64.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158