0

Is there a way I could encode a long number (e.g. 12349874529768521) as lower-case letters AND numbers for the purposes of reducing its length? The idea is that a user might have a long number on a piece of paper.

It seems to me that if there are more symbols available, that the resulting number could be made shorter. So I'm looking for something like hexadecimal but using the larger symbol space of A-Z instead of just A-F.

This would be in C# (if it matters)

NickG
  • 9,315
  • 16
  • 75
  • 115

6 Answers6

5

Base32 encoding is designed to produce an unambiguous, compact, human-readable (and non-obscene!) representation. From Wikipedia:

Base32 has a number of advantages over Base64:

  • The resulting character set is all one case, which can often be beneficial when using a case-insensitive filesystem, spoken language, or human memory.

  • The result can be used as a file name because it can not possibly contain the '/' symbol, which is the Unix path separator.

  • The alphabet can be selected to avoid similar-looking pairs of different symbols, so the strings can be accurately transcribed by hand. (For example, the RFC 4648 symbol set omits the digits for one, eight and zero, since they could be confused with the letters 'I', 'B', and 'O'.)

  • A result excluding padding can be included in a URL without encoding any characters.

Base32 also has advantages over hexadecimal/Base16: Base32 representation takes roughly 20% less space. (1000 bits takes 200 characters, compared to 250 for Base16)

Douglas Crockford's original article on Base32 encoding is also well worth a read.

EDIT: here's a bit of C# that'll do base-N encoding of integers:

class Program {
    private const string BINARY = "01";
    private const string DECIMAL = "0123456789";
    private const string HEX = "0123456789abcdef";
    private const string BASE32 = "0123456789abcdefghjkmnpqrstvwxyz";

    static string EncodeInt32(string alphabet, int value) {
        var sb = new StringBuilder();
        while (value > 0) {
            sb.Insert(0, alphabet[value % alphabet.Length]);
            value = value / alphabet.Length;
        }
        return sb.ToString();
    }

    static int DecodeInt32(string alphabet, string value) {
        int result = 0;
        int b = alphabet.Length;
        int pow = 0;
        for (var i = value.Length-1; i >= 0; i--) {
            result += (int)(Math.Pow(b, pow++)) * alphabet.IndexOf(value[i]);
        }
        return (result);
    }

    static void Main(string[] args) {
        for (var i = 0; i < 1234567890; i += 1234567) { 
            Console.WriteLine("{0} {1} {2}", i, EncodeInt32(BASE32, i), DecodeInt32(BASE32, EncodeInt32(BASE32, i))); 
        }
        Console.ReadKey(false);
    }
}

Example output showing typical reduction in string length:

1227159598 14j9y1e 1227159598
1228394165 14kfknn 1228394165
1229628732 14mn99w 1229628732
1230863299 14ntyy3 1230863299
1232097866 14q0mja 1232097866
1233332433 14r6a6h 1233332433
1234567000 14sbztr 1234567000
Dylan Beattie
  • 53,688
  • 35
  • 128
  • 197
  • Base32 does look like a better option than Base36 if humans will be writing the result down. I'll have to remember that for the future! –  Mar 31 '17 at 11:45
3

How about a BaseN Method to encode/decode your long into a string with characters you defined on your own

public static class BaseN
{
    private const string CharList = "0123456789abcdefghijklmnopqrstuvwxyz";
    public static String Encode(long input)
    {
        if (input < 0) throw new ArgumentOutOfRangeException("input", input, "input cannot be negative");
        var result = new System.Collections.Generic.Stack<char>();
        while (input != 0)
        {
            result.Push(CharList[(int)(input % CharList.Length)]);
            input /= CharList.Length;
        }
        return new string(result.ToArray());
    }

    public static long Decode(string input)
    {
        long result = 0, pos = 0;
        foreach (char c in input.Reverse())
        {
            result += CharList.IndexOf(c) * (long)Math.Pow(CharList.Length, pos);
            pos++;
        }
        return result;
    }
}

Usage:

long number = 12349874529768521;
string result = BaseN.Encode(number);

Sample:

https://dotnetfiddle.net/odwFlk

fubo
  • 44,811
  • 17
  • 103
  • 137
2

Here's a similar approach to the others, using a Base-N conversion:

using System;
using System.Text;

namespace ConsoleApp3
{
    class Program
    {
        static void Main()
        {
            long n = 12349874529768521;

            string baseChars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz@#";

            var encoded = AsBaseN(n, baseChars.ToCharArray());
            Console.WriteLine(encoded); // Prints "9HXNyK2uh"

            long decoded = AsLong(encoded, baseChars.ToCharArray());
            Console.WriteLine(decoded); // Prints "12349874529768521"
        }

        public static string AsBaseN(long value, char[] baseChars)
        {
            var result = new StringBuilder();
            int targetBase = baseChars.Length;

            do
            {
                result.Append(baseChars[value % targetBase]);
                value /= targetBase;
            }
            while (value > 0);

            return result.ToString();
        }

        public static long AsLong(string number, char[] baseChars)
        {
            long result = 0;
            int numberBase = baseChars.Length;
            long multiplier = 1;

            foreach (char c in number)
            {
                result += multiplier * Array.IndexOf(baseChars, c);
                multiplier *= numberBase;
            }

            return result;
        }
    }
}

If you want a different set of allowable characters, just change baseChars as appropriate. For example, if you just want 0-9 and A-Z:

string baseChars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";

This gives a result of T3OPA1YNLD3 (base 36) instead of 9HXNyK2uh (base 64).

Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
1

You can use a base 36 encoder.

Base36 is a binary-to-text encoding scheme that represents binary data in an ASCII string format by translating it into a radix-36 representation. The choice of 36 is convenient in that the digits can be represented using the Arabic numerals 0–9 and the Latin letters A–Z1 (the ISO basic Latin alphabet).

Here's an example of one, but any should work: https://github.com/thewindev/csharpbase36

Example Usage

// Encoding
Base36.Encode(10);    // returns "A"
Base36.Encode(10000); // returns "7PS"

// Decoding
Base36.Decode("Z");   // returns 35L
Base36.Decode("10");  // returns 36L
Base36.Decode("7PS"); // returns 10000L

By default uppercase letters are used. If you really wanted to lowercase then a simple string.ToLowerInvarient() can change that.

However, uppercase is usually easier to read, which is why it's used by default, so you might want to consider using uppercase rather than lowercase.

1

I presume you mean you want to represent the number with fewer characters. Base 36 will do this (0-9, a-z).

xerxes67
  • 11
  • 2
0

You could look to Base64 encoding. It uses 0-9, A-Z, a-z, + and / characters. Or Base36, if you're interested only in 0-9 and A-Z.