10

I want to convert a sequence of numbers to a single number which will retain individual values as well as their position. e.g. the following sequence is provided-

1,6,7,8,9,45,67

here,take for example,if I apply simple addition i.e. 1+6+7+8+9+45+67 then a number will be generated. But from that no. we can't extract the individual numbers with their ordering[i.e. 1,6,7,8,9,...].

Is there any way to achieve this feature without any ambiguous deduction [i.e only 1 unique set of numbers will be extracted from a number.]? Is there any mathematical function which will be helpful to get back the individual elements from that number?

Debadyuti Maiti
  • 1,099
  • 4
  • 18
  • 30
  • Are they all positive integers? – Paddy3118 Sep 01 '12 at 16:26
  • 1
    @Paddy3118 Just for now,I'm considering +ve numbers.If you can do it for both +ve & -ve nos. then that will be awesome. – Debadyuti Maiti Sep 01 '12 at 16:42
  • 1
    How big can the resulting number be? It's possible to convert a bitstream of arbitrary length into an appropriate number (it could be a *really really long number decimal with millions of digits*). It is not possible to convert all bitstreams into a number (e.g. and *integer data-type*) of a finite length: e.g. 32-bit, 64-bit, 128-bits .. and if dealing with streams (and not an *integer data-type*), why does the result have to be a "number"? –  Sep 01 '12 at 17:00

8 Answers8

10

You can convert this to a base-N number, where N is one larger than the largest value that will appear in your input sequence.

UPDATE

Based on the various comments, I would like to offer an alternative solution that may be easier to implement. You can consider the sequence to be a UTF-8 encoded string and use Huffman coding with a custom dictionary to achieve a compact representation.

The custom dictionary allows you to store very common characters with very few bits (for example, the sequence separator ',' and the individual characters '0'..'9' can be stored with as few as 3 bits, but also other numbers that you find to be statistically likely to occur can be stored in a short bit sequence. For example, if you find that "42" occurs frequently, you could store "42" in just a few bits.

If you only assign special codes to ',' and '0' to '9', you will average less than 4 bits per character in the input string while retaining the comma separating sequence members. Finding common, multi-character substrings and adding them to the dictionary will only improve on that ratio.

Using a custom dictionary also means that you do not have to store the dictionary in the header of the compressed data, since it is well-known to you.

I did something just like this using SharpZipLib

http://www.icsharpcode.net/opensource/sharpziplib/

http://community.sharpdevelop.net/forums/p/8255/23219.aspx

It is also easy to do with zlib

Compressing small piece of data

Community
  • 1
  • 1
Eric J.
  • 147,927
  • 63
  • 340
  • 553
  • 1
    Can you explain a bit with an example taking the sequence of nos. as provided in question? – Debadyuti Maiti Sep 01 '12 at 16:29
  • @DebadyutiMaiti It is done all the time, but it is so wired in that it is done unaware. Write down a decimal number (Base-10), say: `9142`. There are 4 separate digits that are uniquely separable based on position: how? :) –  Sep 01 '12 at 16:30
  • @pst Yes.But you are considering only digits.[i.e 9,1,4,2] But I'm considering nos. like 45,67 too.How will you treat them [as digit] then? – Debadyuti Maiti Sep 01 '12 at 16:46
  • @DebadyutiMaiti It's the same. The only difference is one has been wired in since grade-school :) Expand the number of a hexadecimal: `12F`, which is Base-16. Once again, each digit is uniquely separable based on position: how? Now imagine a numbering system where each *digit* can be in the range from `(0)` (a value of 0) to `(43)` (a value of 43) and the numbers are written like: `(1)(42)(4)`. Each digit is uniquely separable based on position: how? –  Sep 01 '12 at 16:47
  • [Sexageisml](http://en.wikipedia.org/wiki/Sexagesimal) is still used to talk about geo-coordiantes. I think [Base of a numeral system](http://en.wikipedia.org/wiki/Base_conversion#Base_of_the_numeral_system) might be a good starting place .. –  Sep 01 '12 at 16:51
  • @pst Suppose a number is formed (420) in base 10.When converted to base 42, then no. would be (10)(0). When you will convert the no. back to base 10, you will get back 420 again.But how exactly are you planning to get the individual elements of a sequence from the number 420? Is it 4,20 or 42,0 or 4,2,0? – Debadyuti Maiti Sep 01 '12 at 17:11
  • @DebadyutiMaiti Well, Base-10 is just one way to *view* a number where each digit is given a certain weight (10). Of course that *same* number could be *viewed* as Base-2 or Base-16 or Base-420 (it doesn't change the number, just the way the number is represented). Eric is arguing to view the input base Base-N such that each input number in the sequence is a separate digit. –  Sep 01 '12 at 17:22
  • 1
    "convert this to a base-N number", how you will retrieve `N` when you want do reverse? if there was a way to save something to retrieve it later, simply you could save numbers in string or array, which are faster, but they will take more memory. – Saeed Amiri Sep 01 '12 at 17:29
  • @SaeedAmiri: You can always convert between two number bases. Let's say the largest number in your input sequence is 15, so you select Base 16 as your representation (this may begin to sound familiar). Let's use symbols A for 10, B for 11, etc. Now, the number A3B in base 16 (also called hexidecimal) would be converted to base 10 using the formula: 10 * 16^3 + 3 * 16^2 + 11 * 16^0 (here 16^n means 16 raised to the power of n). Base 16 is just a familiar example. You can do the same wit any number base. – Eric J. Sep 02 '12 at 01:53
  • @EricJ. I said how to save largest number (the number you used as base number), I'm not talking how to convert from one base to another. – Saeed Amiri Sep 02 '12 at 12:56
  • The optimal solution depends on your use case. If the numbers in your input sequence can really be *any* integer number, the solution is different than if there is some sort of natural maximum for the number. For example, if you were reading data points from a device that will always give you things in the range 0..1024, the answer is very different than if you are expecting entirely random integers. Can you share the nature of numbers you will see? – Eric J. Sep 04 '12 at 15:32
  • Also offered an alternative solution. – Eric J. Sep 04 '12 at 15:47
5

It's mathematically possible to do it for finite sequences, but not very practical because the numbers required get very big very quickly: there are 677 (about 242) different length-7 sequences of integers from 1 ... 67, let alone longer sequences and larger integers.

For a simple example of such a function, map the sequence [1,6,7,8,9,45,67] to the value 21 * 36 * 57 * 78 * 119 * 1345 * 1767. The bases are the prime numbers, the powers are the elements in the sequence.

The reverse mapping is computed by division -- the number of times you can divide your value by 2 is the first element in the sequence, etc. The largest prime factor of the value tells you how long the sequence is.

If you want to allow 0 in the sequence as well as positive numbers, then add 1 to all the elements when you raise the primes to the powers. Or alternatively use the power of 2 to give the length of the sequence, then start encoding the elements starting with 3.

Goedel used encodings like this in his proof of his Incompleteness Theorems.

As Kendall Frey says, it is not possible to define a function that maps each infinite sequence of integers to a different integer. This is a consequence of Cantor's proof that the power set of the natural numbers is uncountable: you can't even injectively map all infinite sequences of elements from {true, false} to the integers, let alone all infinite sequences of elements from the integers.

For more practical approaches, think in terms of encoding your sequence of integers as a sequence of bytes, rather than as a number. A finite sequence of bytes can easily be considered to be a binary value, hence it's a number, you just don't really use it as such. A common representation of your example sequence is the sequence of bytes: [1,6,7,8,9,45,67], used for example in JSON. This is a 136-bit number. The mathematical function to reverse this mapping involves arithmetic modulo powers of 256, subtraction of the number 48, multiplication by 10, and suchlike :-)

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • Yep. And if you map the integers from [0, 1, -1, 2, -2, 3, 3, ..] to [1, 2, 3, 4, 5, ..] etc. you could handle negative numbers too in exactly the same way. – DSM Sep 01 '12 at 18:45
  • Indeed. In general, for any countable set `S` you first map the elements of `S` to the positive integers via some injection, then you do the thing I said above. You've demonstrated an injection for the integers. By applying your mapping and then my method twice, you could likewise encode any finite sequence whose elements are finite sequences of integers. – Steve Jessop Sep 01 '12 at 18:48
2

Let's say your sequence is called s and I'll define len(n) to be the number of digits in n.

Then the first digit of your result is len(s[0]), and the following len(s[0]) digits are the number s[0]; then you append len(s[1]) and s[1], and so on.

This works for numbers with at most 9 digits.

Danica
  • 28,423
  • 6
  • 90
  • 122
bigblind
  • 12,539
  • 14
  • 68
  • 123
  • Could be extended to arbitrary-length digits either by encoding the length with two digits (which would get you up to 19 digits), or with a more complex length-encoding scheme. For example, you could put in `len(len(s[1]))` as a single digit, `len(s[1])` with the specified number of digits between 1 and 9, and then `s[1]`; this works for nonnegative integers up to a billion digits. – Danica Sep 01 '12 at 16:28
  • If they are all positive numbers then Zig-Zag encoding (of Protocol Buffer fame) could encode them efficiently in a bitstream. –  Sep 01 '12 at 16:57
1

You can't, if the range of your numbers is infinite.

The power set of natural numbers in uncountable. This means that you can't provide a mapping between sets of numbers and numbers.

What you could do if your numbers are limited to, say 32 bits, is concatenate the numbers into a long binary number, and store them as a sequence of bytes, maybe as a BigNum.

Kendall Frey
  • 43,130
  • 20
  • 110
  • 148
  • 4
    If the members if the input sequence are not infinite, you surely can. For example, if the largest input number is 9, you can unambiguously write 1,0,0,9 as 1009 – Eric J. Sep 01 '12 at 16:21
1

Here is a basic PHP implementation of the Godel numbering described by Steve Jessop above:

<?php

$sec = array(5,9,8,4);

$n = count($sec);
$max = max($sec);

$enc = encode($sec);
$dec = decode($enc, $n, $max);

echo "Input sequence:  " . implode(",", $sec) . "\n";
echo "Output sequence: " . implode(",", $dec) . "\n";
echo "Godel number:    " . $enc;
echo (PHP_INT_MAX/$enc < 20 ? " - too big to decode.\n" : "\n");

function encode($sec) {
    $primes = array(2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53);
    $enc = 1;
    $i = 0;
    foreach ($sec as $v) {
        $enc = $enc * pow($primes[$i], $v+1);
        $i++;
    }
    return $enc;
}

function decode($enc, $n, $max) {
    $primes = array(2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53);
    $sec = array();
    for ($i = 0; $i < $n; $i++) {
        for ($v = 2; $v <= $max+1; $v++) {
            if ($enc/pow($primes[$i], $v) != round($enc/pow($primes[$i], $v))) {
                break;
            }
        }
        $sec[] = $v-2;
    }
    return $sec;
}

?>

This script works only for small numbers in small sequences. It can not handle very large numbers, I guess same as any other implementation. It is even easier and more efficient to store digits concatenating them as they are: [01,05,15,17] -> 1051517.

0

Updated to check the 0,1 case.

Separate the different numbers by 001.

To avoid confusion with 00 inside your numbers, each time a 0 appears in your numbers, replace it with 01.

To decode, split by 001. Replace all 01 by 0.

SJuan76
  • 24,532
  • 6
  • 47
  • 87
  • Ah... I was thinking padding rather than splitting. How about 1,900900900, 2 – Eric J. Sep 01 '12 at 16:22
  • 3
    1,0 does not work, because it has the same encoding as 10,1. It is not reversible. – Sean Owen Sep 01 '12 at 16:22
  • This kind of thing is not going to work. Easier counter-example to the new idea: 10013,7. Encoding is 100130017. That's also the encoding of 1,30017. – Sean Owen Sep 01 '12 at 19:06
  • @SeanOwen not, 10013,7 encoded is 10101130017. Encoding for 1,30017 is 1001301017. Whenever you see 001, it is a break. – SJuan76 Sep 01 '12 at 23:36
0

This uses a universal code, such as Elias omega coding (or any prefix code -- but universal codes are prefix codes with some desirable properties). A prefix code encodes a bit sequence (i.e., a number) as a prefix that basically provides the necessary info to determine how many bits constitute the rest of the number.

1) Use the code to represent the number of elements in the sequence. 2) Then use the code to represent each element.

0

Another answer which occurred to me. Encode each number to balanced ternary, with two bits per trit (e.g., 0=00; +1=01; -1=10). The remaining bit pair (e.g., 11) is the end of element marker, repeated for end of sequence. Con: less space efficient than the prefix code when large values are expected; Pros: 1) more space efficient with mostly small values; 2) encoding/decoding simpler; 3) directly represents negative values.