2

My script accepts arbitrary-length and -content strings of Python code, then runs them inside exec() statements. If the time to run the arbitrary code passes over some predetermined limit, then the exec() statement needs to exit and a boolean flag needs to be set to indicate that a premature exit has occurred.

How can this be accomplished?

Additional information

These pieces of code will be running in parallel in numerous threads (or at least as parallel as you can get with the GIL).

If there is an alternative method in another language, I am willing to try it out.

I plan on cleaning the code to prevent access to anything that might accidentally damage my system (file and system access, import statements, nested calls to exec() or eval(), etc.).

Options I've considered

  1. Since the exec() statements are running in threads, use a poison pill to kill the thread. Unfortunately, I've read that poison pills do not work for all cases.
  2. Running the exec() statements inside processes, then using process.terminate() to kill everything. But I'm running on Windows and I've read that process creation can be expensive. It also complicates communication with the code that's managing all of this.
  3. Allowing only pre-written functions inside the exec() statements and having those functions periodically check for an exit flag then perform clean-up as necessary. This is complicated, time-consuming, and there are too many corner-cases to consider; I am looking for a simpler solution.

I know this is a bit of an oddball question that deserves a "Why would you ever want to allow arbitrary code to run in an exec() statement?" type of response. I'm trying my hand at a bit of self-evolving code. This is my major stumbling block at the moment: if you allow your code to do almost anything, then it can potentially hang forever. How do you regain control and stop it when it does?

Vijchti
  • 526
  • 6
  • 19
  • 1
    I'd suggest running the code under a user account that has nearly no privileges -- only enough to run the code. – sarnold Jun 29 '12 at 23:19
  • 2
    Since you are doing self-evolving code, it's only "arbitrary" for a certain definition of "arbitrary". That means that you might go away with `exec()` altogether, by for example wrapping the code in a function. `exec()` does allow you to isolate the code somewhat, but you get even better isolation in a process. I concur with that the best solution for you probably is a process pool, with a process manager that kills the process if it takes too long. – Lennart Regebro Jun 30 '12 at 07:17

2 Answers2

3

This isn't a very detailed answer, but its more than I wanted to put into a comment.

You may want to consider something like this other question for creating functions with timeouts, using multiprocessing as a start.

The problem with threads is that you probably can't use your poison pill approach, as they are not workers taking many small bits of tasks. They would be sitting there blocking on a statement. It would never get the value to exit.

You mentioned that your concern about using processes on Windows is that they are expensive. So what you might do is create your own kind of process pool (a list of processes). They are all pulling from a queue, and you submit new tasks to the queue. If any process exceeds the timeout, you kill it, and replace it in the pool with a new one. That way you limit the overhead of creating new processes only to when they are timing out, instead of creating a new one for every task.

Community
  • 1
  • 1
jdi
  • 90,542
  • 19
  • 167
  • 203
  • Thank you for your answer. A process pool was what I had in mind. – Vijchti Jun 30 '12 at 00:09
  • 1
    +1. But before you do the process pool, make sure it really is too expensive for your use case. How much benefit you get ultimately depends how many processes you're creating, and how often you have to kill them. Also, the tasks will have to clean up after themselves properly (if they aren't killed) for the process pool to work. – abarnert Jun 30 '12 at 01:11
3

There are a few different options here.

First, start with jdi's suggestion of using multiprocessing. It may be that Windows process creation isn't actually expensive enough to break your use case.

If it actually is a problem, what I'd personally do is use Virtual PC, or even User Mode Linux, to just run the same code in another OS, where process creation is cheap. You get a free sandbox out of that, as well.

If you don't want to do that, jdi's suggestion of processes pools is a bit more work, but should work well as long as you don't have to kill processes very often.

If you really do want everything to be threads, you can do so, as long as you can restrict the way the jobs are written. If the jobs can always be cleanly unwound, you can kill them just by raising an exception. Of course they also have to not catch the specific exception you choose to raise. Obviously neither of these conditions is realistic as a general-purpose solution, but for your use case, it may be fine. The key is to make sure your code evolver never inserts any manual resource-management statements (like opening and closing a file); only with statements. (Alternatively, insert the open and close, but inside a try/finally.) And that's probably a good idea even if you're not doing things this way, because spinning off hundreds of processes that, e.g., each leak as many file handles as they can until they either time out or hit the file limit would slow your machine to a crawl.

If you can restrict the code generator/evolver even further, you could use some form of cooperative threading (e.g., greenlets), which makes things even nicer.

Finally, you could switch from CPython to a different Python implementation that can run multiple interpreter instances in a single process. I don't know whether jython or IronPython can do so. PyPy can do that, and also has a restricted-environment sandbox, but unfortunately I think both of those—and Python 3.x support—are not-ready-for-prime-time features, which means you either have to get a special build of PyPy (probably without the JIT optimizer), or build it yourself. This might be the best long-term solution, but it's probably not what you want today.

abarnert
  • 354,177
  • 51
  • 601
  • 671