16

Suppose you have to write a program that will test all programs in search of one that completes a specific task. For example, consider this JavaScript function:

function find_truth(){
    for(n=0;;++n){
        try {
            var fn = Function(string(n));
            if (fn() == 42)
                return fn;
        } catch() {
            continue;
        }
    }
}

As long as string(n) returns the nth string possible ("a", "b", "c", ... "aa", "ab" ...), this program would eventually output a function that evaluated to 42. The problem with this method is that it is enumerating strings that could or couldn't be a valid program. My question is: is it possible to enumerate programs themselves? How?

nicael
  • 18,550
  • 13
  • 57
  • 90
MaiaVictor
  • 51,090
  • 44
  • 144
  • 286
  • 3
    If you had an abstract syntax tree of your program, and enumerated variations on *that*, perhaps it would be closer to what you are envisioning? That would simply help with the syntax errors, though; semantic problems could still exist, depending on how comprehensively you made the randomizer. – voithos May 22 '13 at 22:47
  • It would be closer, but what are the actual research and relevant works regarding that idea? As in, not reinvent the well. – MaiaVictor May 22 '13 at 22:48
  • 1
    I think you'd have to iterate for a few million years before you came across a *valid* program. Perhaps if it was powered by an [Improbability Drive](https://en.wikipedia.org/wiki/Technology_in_The_Hitchhiker%27s_Guide_to_the_Galaxy#Infinite_Improbability_Drive) – Mike Christensen May 22 '13 at 22:49
  • 2
    @Dokkat: I don't know of any *specific* works, but my understanding is that this kind of automated program-space search is a fairly large area of research. It is often done in the context of "automated bug repair". Here's a paper I found with a quick google: http://www.cs.unm.edu/~pdevineni/papers/Strawn.pdf – voithos May 22 '13 at 22:53
  • Probably, but the idea is to make it work in the least amount of time possible - even if it is still a few thousand years. (The goal is not to actually performing this, but to learn how programs could be enumerated in a low entropy way - that should be really interesting). – MaiaVictor May 22 '13 at 22:53
  • 2
    @Dokkat: Here's another resource on an actual research project that tried to implement this: http://dijkstra.cs.virginia.edu/genprog/ – voithos May 22 '13 at 22:55
  • @voithos ok that's interesting but not exactly what I'm looking for. – MaiaVictor May 22 '13 at 23:25
  • 2
    You would be interested in reading about [Chaitin's Omega](http://en.wikipedia.org/wiki/Chaitin%27s_omega) – jberryman May 23 '13 at 00:52

6 Answers6

31

Yes, there are ways that this is possible. A few months ago I made a little experimental project to do something like it, which works reasonably well considering what it's doing. You give it a type and some tests to pass, and it searches for a suitable function that passes the tests. I never put in the work to make it mature, but you can see that it does in fact eventually generate functions that pass the tests in the case of the examples. Requiring the type was an essential component for the practicality of this search -- it drastically reduced the number of programs that needed to be tried.

It is important to have a firm grasp of the theory here before making assertions about what is and is not possible -- there are many who will jump to the halting problem and tell you that it's impossible, when the truth is quite a bit more nuanced than that.

  • You can easily generate all syntactically valid programs in a given language. Naively, you can generate all strings and filter out the ones that parse/typecheck; but there are better ways.
  • If the language is not turing complete -- e.g. the simply-typed lambda calculus, or even something very powerful like Agda, you can enumerate all programs that generate 42, and given any program you can decide "this generates 42" or "this does not generate 42". (The language I used in my experimental project is in this class). The tension here is that in any such language, there will be some computable functions which you cannot write.
  • Even if the language is turing complete, you can still enumerate all programs which eventually generate 42 (do this by running them all in parallel, and reporting success as they finish).

What you cannot do to a turing complete language is decide that a program will definitely not ever generate 42 -- it could run forever trying, and you won't be able to tell whether it will eventually succeed until it does. There are some programs which you can tell will infinite loop, though, just not all of them. So if you have a table of programs, you might expect your enumerator program in the case of a non-turing complete language to be like this:

Program |  P1  |  P2  |  P3  |  P4  |  P5  |  P6  |  P7  | ...
42?     | No   | No   | No   | Yes  | No   | No   | No   | ...

Whereas in a turing complete language, your output (at a given time) would be like this:

Program |  P1  |  P2  |  P3  |  P4  |  P5  |  P6    |  P7  | ...
42?     | No   | No   | Loop | Yes  | No   | Dunno  | No   | ...

And later that Dunno might turn into a Yes or a No, or it might stay dunno forever -- you just dunno.

This is all very theoretical, and actually generating programs in a turing complete language to search for ones that do a certain thing is pretty hard and takes a long time. Certainly you don't want to do it one by one, because the space is so big; you probably want to use a genetic search algorithm or something.

Another subtle point in the theory: talking about programs which "generate 42" is missing some important detail. Usually when we talk about generating programs, we want it to generate a certain function, not just output something specific. And this is when things you might want to do get more impossible. If you have an infinite space of possible inputs -- say, inputting a single number, then

  • You can't in general tell whether a program computes a given function. You can't check every input manually because infinity is too many to check. You can't search for proofs that your function does the right thing, because for any computable function f, for any axiom system A, there are programs that compute f such that A has no proof that they compute f.
  • You can't tell whether a program is going to give any kind of answer for every possible input. It might work for the first 400,000,000 of them but infinite loop on the 400,000,001st.
  • Similarly, you can't tell whether two programs compute the same function (i.e. same inputs -> same outputs).

There you have it, a daily dose of decidability theory phenomenology.

If you are interested in doing it for real, look into genetic algorithms, and put timeouts on the functions you try and/or use a decidable (non-turing-complete) language.

luqui
  • 59,485
  • 12
  • 145
  • 204
  • Awesome answer. I took some time to read it as you provided great links that I had to read too. Some considerations: if we are finding an algorithm to compute certain function (for example, QuickSort), the best solutions certainly run fast. So long-running functions can be discarded to no damage at all - in fact, we don't care wether it will actually provide the correct result after some time. So I consider the whole halting problem irrelevant here. Then... (continues) – MaiaVictor May 22 '13 at 23:55
  • 1
    I also have a feeling that if we find an algorithm that pass certain tests, it is very likely that we found the right algorithm to do what we want. That is, look how shortly a functional quicksort can be described: qsort = (x)->(h=head(x); concat(qsort(filter(h,x))). Now, how many as short strings would provide a program that sorted 100 tests correctly but were not generic? – MaiaVictor May 23 '13 at 00:01
  • @Dokkat, the problem with using string length as a heuristic is that it doesn't necessarily correlate with your other requirement (that it be efficient). – Wes May 23 '13 at 00:04
  • 1
    @Dokkat, my plan was to use a language with a type system (that can express [parametericity](http://en.wikipedia.org/wiki/Parametricity)) in order to discard a lot of meaningless junk programs and get more guidance from the user about constraints on the desired function. The other component of my plan was a bit of human guidance about how to destructure the problem; e.g. "for sort, you will probably need comparison and list concatenation" (which in turn can be autogenerated from tests, or written directly) – luqui May 23 '13 at 00:10
  • luqui that sounds too great not to exist, are you sure you're not reinventing the well somewhere there? – MaiaVictor May 23 '13 at 00:43
  • @Dokkat, nope, not sure. I'm more of a creator than a researcher though, so it's not really relevant to me. – luqui May 23 '13 at 01:07
  • So, any words on the idea of developing a grammar that is optimal for this, instead of hacking on top of existing grammars? – MaiaVictor May 23 '13 at 04:11
  • @Dokkat, I suspect that will depend on what kinds of specifications you are trying to meet. Lambda calculus is simple and easy to work with, that's usually my default. A simple concatenative language would be good, especially for a genetic approach (I think... for genetic algorithms you want nearby solutions to be found by nearby programs -- do cat languages have that?). But, basically, look at how you want to specify what you want the target program to do, then try to find a language that can use that information as well as possible. – luqui May 23 '13 at 22:44
  • 2
    I made a similar thing with a linear stack machine, because that seemed to allow the most complex behaviour with the shortest programs. https://github.com/gurgeh/CodeSpace – Gurgeh Jun 03 '13 at 11:05
  • (Page not found) – MathCrackExchange Oct 23 '20 at 01:31
4

It is certainly possible to enumerate all programs in a given language that are syntactically valid (and in a statically typed language even only those that type check): You could simply enumerate all strings like you proposed, try to feed each one into a parser for the language and then reject those that can't be parsed. So if that's your definition of a valid program, then yes, it's possible.

It is however not true that your program would necessarily output a function that returns 42 eventually - even if you replace string with a function that only returns syntactically valid programs. If a returned function contains an infinite loop, it will run forever and thus your program will never get to try another function. Thus you might never reach a function that returns 42.

To avoid this issue, you might say that the string(n) function should only produce code that is syntactically valid and does not contain an infinite loop (while still being able to return all such functions). That, however, is not possible because that would entail deciding the Halting Problem (which is, of course, undecidable).

sepp2k
  • 363,768
  • 54
  • 674
  • 675
  • It could be easily corrected with a "halt if N seconds has passed" guard, though. – MaiaVictor May 22 '13 at 22:55
  • 3
    @Dokkat It could, if you know that the given task can be performed in less than N seconds on the given system. For a function that just returns 42 that would be no problem, but I assume that was just an example. If you want to be able to produce functions that solve arbitrary problems, you will have trouble coming up with an N that's large enough to cover everything. – sepp2k May 22 '13 at 22:59
4

As has been noted, you can trivially turn a "generate all strings" program into a "generate all valid programs in language X" by plugging in a parser/compiler for language X. Generally if you can write a program which takes text as input and returns true/false indicating whether the text represents a valid program, then you can use it as a filter on the "generate all strings" program.

You could also design a programming language in which every string of characters is a valid program (perl springs to mind).

Probably more interesting is that given a formal grammar for a language, you could use that to generate programs in the language instead of parsing them. You just need to do a breadth-first traversal of the grammar to be sure that every finite-length program will be reached in some finite time (if you do a depth-first traversal you'll get struck exploring all programs consisting solely of a variable whose name is an ever-longer sequence of 'A's, or something).

Most grammars actually used in parsing programming languages wouldn't work directly for this though; they're usually slightly over-permissive. For example, a grammar may define identifiers as anything matching a regex [_A-Za-z][0-9_A-Za-z]*, which allows variable names of unbounded length, but many language implementations will actually choke on programs with variable names that are gigabytes long. But you could in principle find out about all of these kind of gotchas and write an enumerable grammar that exactly covers all of the valid programs in some language of interest.


So that lets you enumerate all programs. That's not actually sufficient to let you run your find_truth program and find a program that returns 42 though, because it'll get stuck infinitely evaluating the first program that happens to contain an infinite loop.

But it's still actually possible to do this! You just need to pick an order in which to examine all the possibilities so that everything is eventually reached in some finite time. You've got two infinite "dimensions" to search in; the enumeration of all the programs, and the depth of evaluation of each program. You can make sure you cover all bases by doing something like the following strategy:

  1. Run all programs of length up to 1 for up to 1 step
  2. Run all programs of length up to 2 for up to 2 steps
  3. Run all programs of length up to 3 for up to 3 steps
  4. ...

And so on. This guarantees that whatever the length of the program and number of "steps" needed, you'll eventually hit them without needing to have done an infinite amount of work "first" (so long as a program with your desired result actually exists).

If you've got unbounded storage available you can avoid repeating work between these phases (you store all programs of length N that haven't finished in N steps, along with their state, and then in the next round you run the new programs up to N+1 steps, and run all the "pending" programs for one more step each). The definition of "step" doesn't matter much, so long as it doesn't admit infinite loops. Some finite number of bytecodes, or CPU instructions, or even seconds; you'd need a suspendable implementation of the language, of course.

Ben
  • 68,572
  • 20
  • 126
  • 174
2

Assuming I am correctly interpreting your phrase "is it possible to enumerate programs themselves?" Then Yes you can enumerate programs. This is essentially a Genetic Programming problem. See :

http://en.wikipedia.org/wiki/Genetic_programming

Here Programs themselves are created, run, and evaluated (with a resulting fitness value, where the optimum value = 42), just as with Genetic Algorithms, so yes this would provide your enumeration. Furthermore over several generations you could have it evolve your program to produce 42.

NWS
  • 3,080
  • 1
  • 19
  • 34
  • Genetic Programming does indeed generate computer programs, but it does that in a selective manner. Genetic Programming is not a random search, so definitively will not generate everything. – Mihai Oltean Dec 31 '21 at 18:12
0

It is impossible. The problem is that some programs may take a long time to finish computing. And some programs may be stuck in an infinite loop. Ideally you would like to abort running those programs that are stuck in infinite loops, and keep only the long running programs. You could implement a timer, but what if you had a program that ran longer than the timer, but still would return the correct answer?

In general, the only way to see if a computer program will terminate is to run it and see, with the risk of entering an infinite loop. Of course, you could implement some heuristics to recognize common forms of infinite loops, but in general it is impossible. This is known as the halting problem.

EDIT:

I realize that I only partially answer your question. You ask if it is possible to enumerate programs themselves. This is certainly possible. You already have a way of generating all possible strings in order. Then you could just see which strings parse correctly as a javascript program, and just keep those.

Boris
  • 5,094
  • 4
  • 45
  • 71
  • +1 I was trying to remember the term "halting problem", I knew this was related. – Barmar May 22 '13 at 23:00
  • Actually it technically is possible. The halting problem applies when you need to know *for sure* whether one particular program terminates or not. In this case, you can simulate programs for finite numbers of steps and stop once you've found *a* program that terminates with 42 as the result; all you need is to prove that such a program exists. The OP's `find_truth` driver doesn't do this finite-steps simulation though, you'd need something a little more sophisticated. – Ben May 22 '13 at 23:08
  • Doesn't it depend on the language? Some languages are made up of programs (valid ones that can be typed) that necessarily halt. – Wes May 22 '13 at 23:09
  • @Wes Those languages aren't Turing-complete then, which means generating all programs in such a language doesn't cover the normally understood meaning "enumerating computer programs". – Ben May 22 '13 at 23:10
  • @Ben, sure, I agree. I wasn't sure if the OP was asking about just Turing-complete languages or not. – Wes May 22 '13 at 23:12
  • 1
    Okay this is all valid and good but we're not getting to the point: I'm looking for a low-entropy way to enumerate computer programs. To make it more clear, imagine we wanted to enumerate phrases. We could use my method, testing every possible string, or we could test only for combination of dictionary words, because phrases are made by words, not letters. That would be much better. Now, what's the equivalent for computer programs? – MaiaVictor May 22 '13 at 23:17
  • @Dokkat, using a phrase-structure grammar you could simply start with every possible kind of phrase (Noun-phrase, Verb-phrase, etc...), then go through every possible head (sort of like the type of the node), then go through every possible complement (branch) and do the same thing for each. Essentially a breadth first search of all possible (finite) phrase trees. You would also need to take into account optional branches for di-transitive verbs, and determiners, etc... but that's the basic idea. – Wes May 22 '13 at 23:22
  • 1
    @Dokkat: The equivalent for computer programs would be to enumerate [ASTs](http://en.wikipedia.org/wiki/Abstract_syntax_tree). – hammar May 22 '13 at 23:25
  • So, this becomes: 1. How do you enumerate trees? 2. TBH I'm not happy with this - even if I could enumerate Lisp ASTs I'd be generating a lot of non-working programs in the process. The idea was to enumerate programs that did something, whatever it was. – MaiaVictor May 22 '13 at 23:27
  • @Dokkat, well they would be syntactically valid. You can easily add more restrictions on the type of the branches for each node to increase the likelihood of it being a "working" program. – Wes May 22 '13 at 23:29
  • 1
    whether a given program halts or not is immaterial to the business of enumerating them. it is trivially possible to enumerate them given the symbol space is finite (e.g. ascii characters) and a valid program is finite length. – wim May 23 '13 at 04:11
  • @Downvoter: Please state your reason for downvoting. Otherwise, how could I improve my answers? – Boris May 23 '13 at 07:39
  • 1
    @Boris Not my downvote(s) but: 5 Answers, 4 of which say "yes" you can enumerate a program, and 1 - yours which says "impossible". I suspect your downvotes are because people think you are wrong. – NWS May 23 '13 at 11:23
0

I would suggest starting from the grammar of javascript, for example of ANTLR.

https://github.com/antlr/grammars-v4/blob/master/javascript/javascript/JavaScriptParser.g4

The grammar defines a directed graph similar to this one:

grammar Exp;

/* This is the entry point of our parser. */
eval
    :    additionExp
    ;

/* Addition and subtraction have the lowest precedence. */
additionExp
    :    multiplyExp 
         ( '+' multiplyExp 
         | '-' multiplyExp
         )* 
    ;

/* Multiplication and division have a higher precedence. */
multiplyExp
    :    atomExp
         ( '*' atomExp 
         | '/' atomExp
         )* 
    ;

Using this structure you can create a program that creates all gramatically valid programs of different depths 1, 2, 3, 4, ... and runs these in an emulator. These would be syntactically valid programs although many would return errors (think division by zero or accessing an undefined variable). Also some you would not be able to prove if they finish or not. But generating as many gramatically correct programs is possible with a properly defined grammar like the one provided by ANTLR.

RogerS
  • 1,322
  • 9
  • 11