4

Imagine, there are two same-sized sets of numbers.

Is it possible, and how, to create a function an algorithm or a subroutine which exactly maps input items to output items? Like:

Input = 1, 2, 3, 4
Output = 2, 3, 4, 5

and the function would be:

f(x): return x + 1

And by "function" I mean something slightly more comlex than [1]:

f(x):
    if x == 1: return 2
    if x == 2: return 3
    if x == 3: return 4
    if x == 4: return 5

This would be be useful for creating special hash functions or function approximations.


Update:

What I try to ask is to find out is whether there is a way to compress that trivial mapping example from above [1].

  • 3
    Err, what is your question, because you seemed to have answered it yourself already, the function is: `f(x): return x + 1`. – Bart Kiers Oct 08 '09 at 17:26
  • Creating such a function means guessing. And that’s hard. – Gumbo Oct 08 '09 at 17:27
  • Do you want to know what a function *looks* like? – Bob Kaufman Oct 08 '09 at 17:30
  • Creating such function, yes. I updated the question to better reflect this. –  Oct 08 '09 at 17:30
  • Looks like what you are asking is for a an automated way of figuring out what the exact relationship between input and output values is. I don't think an exact solution (as opposed to an approximate one) is possible. – MAK Oct 08 '09 at 19:10
  • If you want a "toggle function", e.g. given 3 it outputs 5, and given 5 it outputs 3, then: `output=(toggle1+toggle2)-input`. And in our example: `output=8-input`. Note that in a language like C, you can do `x=8-x` to toggle `x`. – Hello World Feb 03 '15 at 20:09

6 Answers6

14

Finding the shortest program that outputs some string (sequence, function etc.) is equivalent to finding its Kolmogorov complexity, which is undecidable.

If "impossible" is not a satisfying answer, you have to restrict your problem. In all appropriately restricted cases (polynomials, rational functions, linear recurrences) finding an optimal algorithm will be easy as long as you understand what you're doing. Examples:

In case of polynomial sequences, it often helps to consider the sequence bn=an+1-an; this reduces quadratic relation to linear one, and a linear one to a constant sequence etc. But there's no silver bullet. You might build some heuristics (e.g. Mathematica has FindSequenceFunction - check that page to get an impression of how complex this can get) using genetic algorithms, random guesses, checking many built-in sequences and their compositions and so on. No matter what, any such program - in theory - is infinitely distant from perfection due to undecidability of Kolmogorov complexity. In practice, you might get satisfactory results, but this requires a lot of man-years.

See also another SO question. You might also implement some wrapper to OEIS in your application.

Fields:

Mostly, the limits of what can be done are described in

  • complexity theory - describing what problems can be solved "fast", like finding shortest path in graph, and what cannot, like playing generalized version of checkers (they're EXPTIME-complete).

  • information theory - describing how much "information" is carried by a random variable. For example, take coin tossing. Normally, it takes 1 bit to encode the result, and n bits to encode n results (using a long 0-1 sequence). Suppose now that you have a biased coin that gives tails 90% of time. Then, it is possible to find another way of describing n results that on average gives much shorter sequence. The number of bits per tossing needed for optimal coding (less than 1 in that case!) is called entropy; the plot in that article shows how much information is carried (1 bit for 1/2-1/2, less than 1 for biased coin, 0 bits if the coin lands always on the same side).

  • algorithmic information theory - that attempts to join complexity theory and information theory. Kolmogorov complexity belongs here. You may consider a string "random" if it has large Kolmogorov complexity: aaaaaaaaaaaa is not a random string, f8a34olx probably is. So, a random string is incompressible (Volchan's What is a random sequence is a very readable introduction.). Chaitin's algorithmic information theory book is available for download. Quote: "[...] we construct an equation involving only whole numbers and addition, multiplication and exponentiation, with the property that if one varies a parameter and asks whether the number of solutions is finite or infinite, the answer to this question is indistinguishable from the result of independent tosses of a fair coin." (in other words no algorithm can guess that result with probability > 1/2). I haven't read that book however, so can't rate it.

Strongly related to information theory is coding theory, that describes error-correcting codes. Example result: it is possible to encode 4 bits to 7 bits such that it will be possible to detect and correct any single error, or detect two errors (Hamming(7,4)).

The "positive" side are:

  • symbolic algorithms for Lagrange interpolation and Pade approximation are a part of computer algebra/symbolic computation; von zur Gathen, Gerhard "Modern Computer Algebra" is a good reference.

  • data compresssion - here you'd better ask someone else for references :)

Community
  • 1
  • 1
sdcvvc
  • 25,343
  • 4
  • 66
  • 102
  • 2
    "If 'impossible' is not a satisfying answer" - haha! Yes, "Kolmogorov complexity" is exactly the right direction. And Thanks for all the links. –  Oct 08 '09 at 21:42
  • The links are incredible, please, don't you have more? What is the field dealing with such things called, anyway? Formal languages? Complexity theory? –  Oct 08 '09 at 21:43
  • Wow! Now that opens some fields I had never heard of. – Matthieu M. Oct 09 '09 at 14:41
4

Ok, I don't understand your question, but I'm going to give it a shot.

If you only have 2 sets of numbers and you want to find f where y = f(x), then you can try curve-fitting to give you an approximate "map".

In this case, it's linear so curve-fitting would work. You could try different models to see which works best and choose based on minimizing an error metric.

Is this what you had in mind?

Here's another link to curve-fitting and an image from that article:

Linear fitting

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
Jacob
  • 34,255
  • 14
  • 110
  • 165
  • 1
    Yes, except the curve-fitting produces very high-order polynomial for large sets, am I mistaken? But for smaller cases, yes, this could work, thanks. –  Oct 08 '09 at 17:35
  • You can choose the order of polynomial you want to use for your approximation function. Linear, quadratic, etc. You could also attempt piecewise linear. – mskfisher Oct 08 '09 at 17:38
  • As mskfisher says, curve-fitting can be forced to produce low-degree polynomials (with approximation), and even without approximation, polynomial interpolation can find the lowest-degree polynomial that fits (i.e., it will find "x+1" for the set in your question). You can also try splines, which can sometimes have a shorter description than a single polynomial that fits all points. – ShreevatsaR Oct 08 '09 at 18:25
1

It seems to me that you want a hashtable. These are based in hash functions and there are known hash functions that work better than others depending on the expected input and desired output.

If what you want a algorithmic way of mapping arbitrary input to arbitrary output, this is not feasible in the general case, as it totally depends on the input and output set.

For example, in the trivial sample you have there, the function is immediately obvious, f(x): x+1. In others it may be very hard or even impossible to generate an exact function describing the mapping, you would have to approximate or just use directly a map.

Vinko Vrsalovic
  • 330,807
  • 53
  • 334
  • 373
  • The second one - "algorithmic way of mapping". Why is this not feasible in general case? I have only one input and one output..? –  Oct 08 '09 at 17:36
0

In some cases (such as your example), linear regression or similar statistical models could find the relation between your input and output sets.

Dave Costa
  • 47,262
  • 8
  • 56
  • 72
  • I would like to have an exact mapping of inputs-outputs, I don't believe linear regression can do that...? –  Oct 08 '09 at 17:44
  • If there is a perfect linear relationships between the inputs and outputs, as in your example, linear regression would find it. The correlation of the regression would be 1, to indicate a perfect fit. – Dave Costa Oct 09 '09 at 12:50
0

Doing this in the general case is arbitrarially difficult. For example, consider a block cipher used in ECB mode: It maps an input integer to an output integer, but - by design - deriving any general mapping from specific examples is infeasible. In fact, for a good cipher, even with the complete set of mappings between input and output blocks, you still couldn't determine how to calculate that mapping on a general basis.

Obviously, a cipher is an extreme example, but it serves to illustrate that there's no (known) general procedure for doing what you ask.

Nick Johnson
  • 100,655
  • 16
  • 128
  • 198
0

Discerning an underlying map from input and output data is exactly what Neural Nets are about! You have unknowingly stumbled across a great branch of research in computer science.

ldog
  • 11,707
  • 10
  • 54
  • 70