3

I really don't know what the name of this problem is, but it's something like lossy compression, and I have a bad English, but I will try to describe it as much as I can.

Suppose I have list of unsorted unique numbers from unknown source, the length is usually between 255 to 512 with a range from 0 to 512.
I wonder if there is some kind of an algorithm that reads the data and return something like a seed number that I can use to generate a list somehow close to the original but with some degree of error.

For example

  • original list

    {5, 13, 25, 33, 3, 10}
    
  • regenerated list

    {4, 10, 30, 30, 5, 5} or {8, 20, 20, 35, 5, 9} //and so on
    

Does this problem have a name, and is there an algorithm that can do what I just described?
Is it the same as Monte Carlo method because from what I understand it isn't.

Is it possible to use some of the techniques used in lossy compression to get this kind of approximation ?

What I tried to do to solve this problem is to use a simple 16 bit RNG and brute-force all the possible values comparing them to the original list and pick the one with the minimum difference, but I think this way is rather dumb and inefficient.

user1748906
  • 526
  • 4
  • 13
  • Still need more description to solve your problem. – Meysam Tolouee May 10 '14 at 06:41
  • 1
    I think you want to add 'noise' to the original list right? – compie May 10 '14 at 06:43
  • it is something like lossy compression – user1748906 May 10 '14 at 06:44
  • 1
    You could approximate this sequence with a polynomial. In a simple case, that would be a linear function that just passes all points with minimal distance. Using higher order polynomials you could achieve an even better approximation. Trying to condense the whole range of numbers into a single 16-bit number without severe loss (up to complete unrecognizability) is probably not possible. – Ulrich Eckhardt May 10 '14 at 06:45
  • i used 16 bit number because my approach is based on brute force, imagine 255 bit number this would be insane, but it's OK to use a 255 bit number or bigger (in a form of bit array of course) as long as it's fast. – user1748906 May 10 '14 at 06:57
  • Is there any relation between the numbers in the sequence that you could exploit? If it's e.g. a waveform, you could use a Fourier transformation or something like that. – Ulrich Eckhardt May 10 '14 at 07:28
  • unfortunately there isn't – user1748906 May 10 '14 at 07:34
  • You never mention what is your `random` there are many ways in which a random number can be generated. Your problem definitely sounds like polynomial interpolation, but it is hard to understand what exactly is that you want. – Ivaylo Strandjev May 10 '14 at 09:17
  • What kind of errer metric are you working with? Do you want small average error, few wrong values, or few bit errors? – sh1 May 12 '14 at 17:49
  • Do you need the candidate generator to have other qualities for extrapolation or interpolation? Sort of like fractal compression or something. – sh1 May 12 '14 at 17:56

1 Answers1

2

This is indeed lossy compression.

You don't tell us the range of the values in the list. From the samples you give we can extrapolate that they count at least 6 bits each (0 to 63). In total, you have from 0 to 3072 bits to compress.

If these sequences have no special property and appear to be random, I doubt there is any way to achieve significant compression. Think that the probability of an arbitrary sequence to be matched from a 32 bits seed is 2^32.2^(-3072)=7.10^(-916), i.e. less than infinitesimal. If you allow 10% error on every value, the probability of a match is 2^32.0.1^512=4.10^(-503).

A trivial way to compress with 12.5% accuracy is to get rid of the three LSB of each value, leading to 50% savings (1536 bits), but I doubt this is what you are looking for.

Would be useful to measure the entropy of the sequences http://en.wikipedia.org/wiki/Entropy_(information_theory) and/or possible correlations between the values. This can be done by plotting all (V, Vi+1) pairs, or (Vi, Vi+1, Vi+2) triples and looking for patterns.