Creating a minimal sandbox for running binary programs in Python3

Question

I am trying to build a Python sandbox for running student's code in a minimal and safe environment. I intend to run it into a container and to limit its access to the resources of that container. So, I am currently designing the part of the sandbox that is supposed to run into the container and handle the access to the resources.

For now, my specification is to limit the amount of time and memory used by the process. I also need to be able to communicate with the process through the stdin and to catch the retcode, stdout and stderr at the end of the execution.

Moreover, the program may enter in an infinite loop and fill-up the memory through the stdout or stderr (I had one student's program that crashed my container because of that). So, I want also to be able to limit the size of the recovered stdout and stderr (after a certain limit is reached I can just kill the process and ignore the rest of the output. I do not care about these extra data as it is most likely a buggy program and it should be discarded).

For now, my sandbox is catching almost everything, meaning that I can:

Set a timeout as I want;
Set a limit to the memory used in the process;
Feed the process through a stdin (for now a given string);
Get the final retcode, stdout and stderr.

Here is my current code (I tried to keep it small for the example):

MEMORY_LIMIT  = 64 * 1024 * 1024
TIMEOUT_LIMIT = 5 * 60

__NR_FILE_NOT_FOUND = -1
__NR_TIMEOUT        = -2
__NR_MEMORY_OUT     = -3

def limit_memory(memory):
    import resource
    return lambda :resource.setrlimit(resource.RLIMIT_AS, (memory, memory))

def run_program(cmd, sinput='', timeout=TIMEOUT_LIMIT, memory=MEMORY_LIMIT):
    """Run the command line and output (ret, sout, serr)."""
    from subprocess import Popen, PIPE
    try:
        proc =  Popen(cmd, stdin=PIPE, stdout=PIPE, stderr=PIPE,
                      preexec_fn=limit_memory(memory))
    except FileNotFoundError:
        return (__NR_FILE_NOT_FOUND, "", "")

    sout, serr = "".encode("utf-8"), "".encode("utf-8")
    try:
        sout, serr = proc.communicate(sinput.encode("utf-8"), timeout=timeout)
        ret = proc.wait()
    except subprocess.TimeoutExpired:
        ret = __NR_TIMEOUT
    except MemoryError:
        ret = __NR_MEMORY_OUT
    return (ret, sout.decode("utf-8"), serr.decode("utf-8"))

if __name__ == "__main__":
    ret, out, err = run_program(['./example.sh'], timeout=8)
    print("return code: %i\n" % ret)
    print("stdout:\n%s" % out)
    print("stderr:\n%s" % err)

The missing features are:

Set a limitation on the size of stdout and stderr. I looked on the Web and saw several attempts, but none is really working.
Attach a function to stdin better than just a static string. The function should connect to the pipes stdout and stderr and return bytes to stdin.

Does anyone has an idea about that ?

PS: I already looked at:

You can create your own buffers for STDIN/STDOUT/STDERR, instead of piping them between the processes, and then strictly control their sizes, but the real problem is that you're not really creating a sandbox here - it's trivial to escape it. If you want a proper sandbox use some thin-VM container system like [Docker](https://www.docker.com/) or even [Vagrant](https://www.vagrantup.com/) and then you can control every aspect of it with the benefit of being almost impossible to break out of them. — zwer, Dec 06 '17 at 14:58
@zwer: Yes, you are right about the incomplete separation of the sandbox from the host system. This will be the next step and I intend to use QEMU for that. But, the code that I issued here is supposed to run inside the container (I maybe should have mentioned it). Concerning the buffers of `stdin`, `stdout` and `stderr`, this is exactly what I would like to achieve but I do not know exactly how to do it. That's why I am asking. — perror, Dec 06 '17 at 15:04

score 2 · Answer 1 · answered Dec 06 '17 at 18:41

As I was saying, you can create your own buffers and write the STDOUT/STDERR to them, checking the size along the way. For convenience, you can write a small io.BytesIO wrapper to do the check for you, e.g.:

from io import BytesIO

# lets first create a size-controlled BytesIO buffer for convenience
class MeasuredStream(BytesIO):

    def __init__(self, maxsize=1024):  # lets use a 1 KB as a default
        super(MeasuredStream, self).__init__()
        self.maxsize = maxsize
        self.length = 0

    def write(self, b):
        if self.length + len(b) > self.maxsize:  # o-oh, max size exceeded
            # write only up to maxsize, truncate the rest
            super(MeasuredStream, self).write(b[:self.maxsize - self.length])
            raise ValueError("Max size reached, excess data is truncated")
        # plenty of space left, write the bytes and increase the length
        self.length += super(MeasuredStream, self).write(b)
        return len(b)  # convention: return the written number of bytes

Mind you, if you intend to do truncation / seek & replace you'll have to account for those in your length but this is enough for our purposes.

Anyway, now all you need to do is to handle your own streams and account for the possible ValueError from the MeasuredStream, instead of using Popen.communicate(). This, unfortunately, also means that you'll have to handle the timeout yourself. Something like:

from subprocess import Popen, PIPE, STDOUT, TimeoutExpired
import sys
import time

MEMORY_LIMIT  = 64 * 1024 * 1024
TIMEOUT_LIMIT = 5 * 60
STDOUT_LIMIT  = 1024 * 1024  # let's use 1 MB as a STDOUT limit

__NR_FILE_NOT_FOUND      = -1
__NR_TIMEOUT             = -2
__NR_MEMORY_OUT          = -3
__NR_MAX_STDOUT_EXCEEDED = -4  # let's add a new return code

# a cross-platform precision clock
get_timer = time.clock if sys.platform == "win32" else time.time

def limit_memory(memory):
    import resource
    return lambda :resource.setrlimit(resource.RLIMIT_AS, (memory, memory))

def run_program(cmd, sinput='', timeout=TIMEOUT_LIMIT, memory=MEMORY_LIMIT):
    """Run the command line and output (ret, sout, serr)."""
    from subprocess import Popen, PIPE, STDOUT
    try:
        proc =  Popen(cmd, stdin=PIPE, stdout=PIPE, stderr=STDOUT,
                      preexec_fn=limit_memory(memory), timeout=timeout)
    except FileNotFoundError:
        return (__NR_FILE_NOT_FOUND, "", "")
    sout = MeasuredStream(STDOUT_LIMIT)  # store STDOUT in a measured stream
    start_time = get_timer()  # store a reference timer for our custom timeout
    try:
        proc.stdin.write(sinput.encode("utf-8"))  # write the input to STDIN
        proc.stdin.flush()  # flush the STDOUT buffer
        while True:  # our main listener loop
            line = proc.stdout.readline()  # read a line from the STDOUT
            # use proc.stdout.read(buf_size) instead to handle your own buffer
            if line != b"":  # content collected...
                sout.write(line)  # write it to our stream
            elif proc.poll() is not None:  # process finished, nothing to do
                break
            # finally, check the current time progress...
            if get_timer() >= start_time + TIMEOUT_LIMIT:
                raise TimeoutExpired(proc.args, TIMEOUT_LIMIT)
        ret = proc.poll()  # get the return code
    except TimeoutExpired:
        proc.kill()  # we're no longer interested in the process, kill it
        ret = __NR_TIMEOUT
    except MemoryError:
        ret = __NR_MEMORY_OUT
    except ValueError:  # max buffer reached
        proc.kill()  # we're no longer interested in the process, kill it
        ret = __NR_MAX_STDOUT_EXCEEDED
    sout.seek(0)  # rewind the buffer
    return ret, sout.read().decode("utf-8")  # send the results back

if __name__ == "__main__":
    ret, out, err = run_program(['./example.sh'], timeout=8)
    print("return code: %i\n" % ret)
    print("stdout:\n%s" % out)
    print("stderr:\n%s" % err)

There are two 'issues' with this, tho, the first one being quite obvious - I'm piping the subprocesses STDERR to STDOUT so the result would be a mix in. Since reading from STDOUT and STDERR streams is a blocking operation, if you want to read them both separately you'll have to spawn two threads (and separately handle their ValueError exceptions when a stream size is exceeded). The second issue is that the subprocesses STDOUT can lock out the timeout check as it depends on STDOUT actually flushing some data. This can also be solved by a separate timer thread that will forcefully kill the process if the timeout is exceeded. In fact, that's exactly what Popen.communicate() does.

The principle of operation would essentially be the same, you'll just have to outsource the checks to separate threads and join everything back in the end. That's an exercise I'll leave to you ;)

As for your second missing feature, could you elaborate a bit more what you have in mind?

Thanks a lot for the explanation, I think I better see where are the problems when implementing this. I'll try to come with a working solution on my side (I think I need to run several threads as you mentioned it in your answer). Concerning my second feature, I guess it should be a separated question as it seems to be more complex than I expected. In fact, I would like to replace the string that feed the `stdin` by a function able to read `stdout` and `stderr` and feed the `stdin` in consequence of the previous output. This will require asynchronous coding, so I need to look at `asyncio`. — perror, Dec 07 '17 at 12:47

perror · Accepted Answer · 2017-12-14T13:16:12.040

It seems that this problem is more complex than it seems, I had hard time to discover solutions on the Web and understand them all.

In fact, the complexity of the problem comes from the fact that there are several ways to solve it. I explored three ways (threading, multiprocessing and asyncio).

Finally, I chose to use a separate thread to listen to the current subprocess and capture the output of the program. It seems to me to be the simplest, the most portable and the most efficient way to proceed.

So, the basic idea behind this solution is to create a thread that will be listening to stdout and stderr and gather all the output. When you reach a limit, you just kill the process and return.

Here is a simplified version of my code:

from subprocess import Popen, PIPE, TimeoutExpired
from queue import Queue
from time import sleep
from threading import Thread

MAX_BUF = 35

def stream_reader(p, q, n):
    stdout_buf, stderr_buf = b'', b''
    while p.poll() is None:
        sleep(0.1)
        stdout_buf += p.stdout.read(n)
        stderr_buf += p.stderr.read(n)
        if (len(stdout_buf) > n) or (len(stderr_buf) > n):
            stdout_buf, stderr_buf = stdout_buf[:n],  stderr_buf[:n]
            try:
                p.kill()
            except ProcessLookupError:
                pass
            break
    q.put((stdout_buf.decode('utf-8', errors="ignore"),
           stderr_buf.decode('utf-8', errors="ignore")))

# Main function    
cmd = ['./example.sh']

proc = Popen(cmd, shell=False, stdin=PIPE, stdout=PIPE, stderr=PIPE)
q = Queue()

t_io = Thread(target=stream_reader, args=(proc, q, MAX_BUF,), daemon=True)
t_io.start()

# Running the process
try:
    proc.stdin.write(b'AAAAAAA')
    proc.stdin.close()
except IOError:
    pass

try:
    ret = proc.wait(timeout=20)
except TimeoutExpired:
    ret = -1 # Or whatever code you decide to give it.

t_io.join()
sout, serr = q.get()

print(ret, sout, serr)

You can attach whatever you want to the example.sh script that is run. Note that there are several pitfalls that are avoided here to avoid deadlocks and broken code (I tested a bit this script). Yet, I am not totally sure of this script, so do not hesitate to mention obvious errors or improvements.

Creating a minimal sandbox for running binary programs in Python3

2 Answers2