3

Possible Duplicate:
Compress 21 Alphanumeric Characters in to 16 Bytes

I have 8 numbers

is there a way to convert it to a string which length is 4?

for instance:

input 12345678

output AB2D

the conversion should be able to convert back exactly.

I tried to convert it to 32 Hex, but it still has 5 numbers, and I can't use 0 1 O I in the final string. Is there any good suggestions? btw, only upper case letters could be used.

Community
  • 1
  • 1
shengy
  • 9,461
  • 4
  • 37
  • 61
  • 1
    If you can get a statistic on the occurrence count of your numbers you can encode it using Huffman algo. Bit 1 for the most occurrence, 01 for next highest, 010 for the next. – Siddharth Oct 16 '12 at 09:43
  • 12
    So you have (26-2) + (10-2) = 32 characters. So you can represent at most 32^4 different values, which is 1048576 which is less than 12345678. The [pigeonhole principle](https://en.wikipedia.org/wiki/Pigeonhole_principle) says you can't do it. – Joachim Sauer Oct 16 '12 at 09:45
  • 2
    Correct me if I'm wrong however AB2D is not using only upper case letters... – ramsinb Oct 16 '12 at 09:45
  • 3
    Huffman encode will help reduce the average length at the cost of the maximum length. In this case its the maximum length which is the restriction. – Peter Lawrey Oct 16 '12 at 09:46
  • @PeterLawrey: I was assuming that every 8-digit number needs to be representable. – Joachim Sauer Oct 16 '12 at 09:46
  • Unless of course his eight digit string has some special structure that makes it compressible. If the whole 10^8 strings aren't permissible, there might be a way. – Nicholas Wilson Oct 16 '12 at 09:46

2 Answers2

13

It comes down to maths. If you need to be able to represent 10^8 possible numbers and you need to use 4 symbols, each symbol must allow 100 different values. 10^8^(1/4) You can do this using an 8-bit byte but your requirement of only upper case letters could be used. suggests your options are rather limited. You have to determine 100 letters you can use or you have to make assumptions about the range of numbers you can have.

BTW: If you can use non-ASCII upper case characters you don't have a problem. ;)

There are 26 upper case ASCII characters, 30 upper case characters between 128 and 255 and 10 digits so you would have to use 34 symbols as well. If you can use Unicode, there are 1898 upper-case and digit Unicode characters.

There is 163 non lower-case characters between (char) 32 and (char) 255 and if you can use most of these you can do it.


A hand picked list of possible characters will be a better choice but this is an example.

static final char[] ENCODE = new char[100];

static {
    int x = 0;
    for (char i = ' ' + 1; i < 256 && x < 100; i++)
        if (!Character.isLowerCase(i) && !Character.isWhitespace(i))
            ENCODE[x++] = i;
    assert x == ENCODE.length;
}

public static char[] encode(int n) {
    assert n >= 0 && n < 100000000;
    char[] ret = new char[4];
    for (int i = ret.length - 1; i >= 0; i--) {
        ret[i] = ENCODE[n % 100];
        n /= 100;
    }
    return ret;
}

public static int decode(char[] chars) {
    int n = 0;
    for (char ch : chars) {
        int x = Arrays.binarySearch(ENCODE, ch);
        assert x >= 0;
        n = n * 100 + x;
    }
    return n;
}

public static void main(String... args) {
    char[] chars = encode(12345678);
    System.out.println("Encoded: " + new String(chars));
    int n = decode(chars);
    System.out.println("Dencoded: " + n);
}

prints using a character which is non-printable in this font :(

Encoded: -CY
Dencoded: 12345678
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
1

No. Eight digits represents too many combinations. Each decimal digit would have to representable with a single symbol for the whole number to be a string of length four. That means you need to use 100 symbols. As you say, hex doesn't have enough characters, although it's more than decimal. If you can't use lower case letters, I'm assuming that punctuation is also not allowed. In that case, you're just not going to have enough symbols.

Nicholas Wilson
  • 9,435
  • 1
  • 41
  • 80