How do sites like codepad.org and ideone.com sandbox your program?

Question

I need to compile and run user-submitted scripts on my site, similar to what codepad and ideone do. How can I sandbox these programs so that malicious users don't take down my server?

Specifically, I want to lock them inside an empty directory and prevent them from reading or writing anywhere outside of that, from consuming too much memory or CPU, or from doing anything else malicious.

I will need to communicate with these programs via pipes (over stdin/stdout) from outside the sandbox.

[This](http://goo.gl/k8SMY) may not directly give you a complete answer and but would provide you an insight into how Sandboxie works. — Pratik, Dec 02 '10 at 05:45
@Shaz Ya, Video has been removed from that link, will try & find if any alternative link. — Pratik, Sep 19 '11 at 10:44

score 24 · Answer 1 · edited Jan 26 '13 at 22:26

codepad.org has something based on geordi, which runs everything in a chroot (i.e restricted to a subtree of the filesystem) with resource restrictions, and uses the ptrace API to restrict the untrusted program's use of system calls. See http://codepad.org/about .

I've previously used Systrace, another utility for restricting system calls.

If the policy is set up properly, the untrusted program would be prevented from breaking anything in the sandbox or accessing anything it shouldn't, so there might be no need put programs in separate chroots and create and delete them for each run. Although that would provide another layer of protection, which probably wouldn't hurt.

So.... do you think I would manually create just one sandbox, and then just throw everything in there then? Not necessary to recreate them for each instance? — mpen, Dec 02 '10 at 19:00

score 17 · Answer 2 · edited May 23 '17 at 12:09

Some time ago I was searching for a sandbox solution to use in an automated assignment evaluation system for CS students. Much like everything else, there is a trade-off between the various properties:

Isolation and access control granularity
Performance and ease of installation/configuration

I eventually decided on a multi-tiered architecture, based on Linux:

Level 0 - Virtualization:

By using one or more virtual machine snapshots for all assignments within a specific time range, it was possible to gain several advantages:
- Clear separation of sensitive from non-sensitive data.
- At the end of the period (e.g. once per day or after each session) the VM is shutdown and restarted from the snapshot, thus removing any remnants of malicious or rogue code.
- A first level of computer resource isolation: each VM has limited disk, CPU and memory resources and the host machine is not directly accessible.
- Straight-forward network filtering: By having the VM on an internal interface, the firewall on the host can selectively filter the network connections.
  
  For example, a VM intended for testing students of an introductory programming course could have all incoming and outgoing connections blocked, since students at that level would not have network programming assignments. At higher levels the corresponding VMs could e.g. have all outgoing connections blocked and allow incoming connection only from within the faculty.
It would also make sense to have a separate VM for the Web-based submission system - one that could upload files to the evaluation VMs, but do little else.
Level 1 - Basic cperating-system contraints:

On a Unix OS that would contain the traditional access and resource control mechanisms:
- Each sandboxed program could be executed as a separate user, perhaps in a separate chroot jail.
- Strict user permissions, possibly with ACLs.
- ulimit resource limits on processor time and memory usage.
- Execution under nice to reduce priority over more critical processes. On Linux you could also use ionice and cpulimit - I am not sure what equivalents exist on other systems.
- Disk quotas.
- Per-user connection filtering.
You would probably want to run the compiler as a slightly more privileged user; more memory and CPU time, access to compiler tools and header files e.t.c.
Level 2 - Advanced operating-system constraints:

On Linux I consider that to be the use of a Linux Security Module, such as AppArmor or SELinux to limit access to specific files and/or system calls. Some Linux distributions offer some sandboxing security profiles, but it can still be a long and painful process to get something like this working correctly.
Level 3 - User-space sandboxing solutions:

I have successfully used Systrace in a small scale, as mentioned in this older answer of mine. There several other sandboxing solutions for Linux, such as libsandbox. Such solutions may provide more fine-grained control over the system calls that may be used than LSM-based alternatives, but can have a measurable impact on performance.
Level 4 - Preemptive strikes:

Since you will be compiling the code yourself, rather than executing existing binaries, you have a few additional tools in your hands:
- Restrictions based on code metrics; e.g. a simple "Hello World" program should never be larger than 20-30 lines of code.
- Selective access to system libraries and header files; if you don't want your users to call connect() you might just restrict access to socket.h.
- Static code analysis; disallow assembly code, "weird" string literals (i.e. shell-code) and the use of restricted system functions.
A competent programmer might be able to get around such measures, but as the cost-to-benefit ratio increases they would be far less likely to persist.
Level 0-5 - Monitoring and logging:

You should be monitoring the performance of your system and logging all failed attempts. Not only would you be more likely to interrupt an in-progress attack at a system level, but you might be able to make use of administrative means to protect your system, such as:
- calling whatever security officials are in charge of such issues.
- finding that persistent little hacker of yours and offering them a job.

The degree of protection that you need and the resources that you are willing to expend to set it up are up to you.

I think I'll try systrace then; that's a *really* unhelpful website though! Will that allow me to limit to cpu and memory usage and all that too? Or do I kind of need to 'stack' the different programs together for the full effect? One criterion I forgot to mention was that I need to communicate with these programs via pipes. I assume I can do that with systrace? — mpen, Aug 14 '12 at 04:36
IIRC systrace is essentially a system call filter. I do not remember if it has any kind of resource control, but stacking `ulimit`,`nice`, `ionice` e.t.c. is rather standard in the Unix/Linux world. As for the programs, they work as they do outside of systrace, albeit quite slower, as long as systrace does not decide to block a system call... — thkala, Aug 14 '12 at 07:24
Recently I read somewhere that some Linux distributions (Redhat and ...?) have a sandbox policy based on SELinux that even allows the execution of graphical programs. You might want to look at it - if it does what you need it would be definitely more performant and streamlined than systrace. — thkala, Aug 14 '12 at 07:26
when you're using `chroot` how do you prevent users from just executing `exit`. — Yahya Uddin, Mar 18 '16 at 18:59
@YahyaUddin: In cases like this `exit` terminates the `chroot` session and therefore that particular user session. It does not return to a shell, either because it replaced the shell via `exec` or because it was not started via a shell in the first place. — thkala, Mar 19 '16 at 12:02
What do you mean replaced the shell via `exec`? Also how else can you start a chown session unless you do it in the shell. (Note I'm trying to execute a script (e.g. php, java, bash..) in a chown session.) — Yahya Uddin, Mar 19 '16 at 15:10
@YahyaUddin: 1. see [`execve(2)`](http://linux.die.net/man/2/execve) or the output of `help exec` in Bash. 2. Any program with the `CAP_SYS_CHROOT` capability (typically, but not exclusively, programs run as `root`) can start a constrained process in a `chroot`. Have a look at [`chroot(2)`](http://linux.die.net/man/2/chroot). — thkala, Mar 19 '16 at 23:14

liuyu · Answer 3 · 2012-08-15T23:02:22.693

I am the developer of libsandbox mentioned by @thkala, and I do recommend it for use in your project.

Some additional comments on @thkala's answer,

it is fair to classify libsandbox as a user-land tool, but libsandbox does integrate standard OS-level security mechanisms (i.e. chroot, setuid, and resource quota);
restricting access to C/C++ headers, or static analysis of users' code, does NOT prevent system functions like connect() from being called. This is because user code can (1) declare function prototypes by themselves without including system headers, or (2) invoke the underlying, kernel-land system calls without touching wrapper functions in libc;
compile-time protection also deserves attention because malicious C/C++ code can exhaust your CPU with infinite template recursion or pre-processing macro expansion;

How do sites like codepad.org and ideone.com sandbox your program?

3 Answers3

Linked