Python evaluate and grade code from students

Question

For a class, I would like to automatically evaluate (parts) of the coding assignments of students. The setup I had in mind is something like:

Students get a class skeleton, which they have to fill in.
Students ``upload'' this class definition to a server (or via webinterface)
The server runs a script an test on specific functions, eg class.sigmoid(x), and checks if the output of the function is correct and might give suggestions.

This setup brings a whole lot of problems, since you're evaluating untrusted code. However, it would be extremely useful, for many of my classes, so I'm willing to spend some time in thinking it trough. I remember Coursera had something similar for matlab/octace assignments, but I can't get the details of that.

I've looked at many online python interfaces (eg, codecademy.com, ideone.com, c9.io); while they seem perfect to learn and or share code, with online evaluation. I do miss the option, that the evaluation script is "hidden" from the students (ie the evaluation script should contain a correct reference implementation to compare output on random generated data). Moreover, the course I give requires some data (eg images) and packages (sklearn / numpy), which is not always available.

Specifically, my questions are

Do I miss an online environment which actually offers such a functionality. (that would be easiest)
To set this up myself, I was thinking to host it at (eg) amazon cloud (so no problem with infrastructure at University), but are there any python practices you could recommend on sandboxing the evaluation?

Thanks in advance for any suggestions!

Pity to hear that the question is not suitable for StackOverflow. Thanks to the people (partially) answering the question.

After some more feedback via other channels, I think my approach will become as follows:

Student gets skeleton and fills it in
Student also has the evaluation script.
In the script, some connections with a server are made to
- login
- obtain some random data
- check if the output of the students code is numerically identical to what the server expects.

In this way the students code is evaluated locally, but only output is send to the server. This limits the kind of evaluations possible, but still allows for kind of automatic evaluation of code.

Probably related if you want to do your own implementation: http://stackoverflow.com/questions/4207485/exploitable-python-functions — Łukasz Rogalski, Oct 10 '14 at 15:59
@Nsh Thanks! That is a good pointer. Luckily, it seems that most named modules are not required for the students. May be the master-script requires a few of them to open a file. — tmensink, Oct 10 '14 at 16:08
It really depends on what you want to do, and whether (how much) you trust your students, and how many of them you have. Coursera also did course on Cuda where each student could compile and run CUDA code on their machines --- which is generally way more unsafe than evaluating python. If you have like 20-40 students who you know I'd say: you can ignore the sandbox or just run their code on a restricted VM. I did this once (with postgresql code, which is harder to exploit, but exploitable still!). Feel free to ask question on academia.SO whether sandboxing is neccessary in general case. — jb., Oct 12 '14 at 14:58

Ned Batchelder · Answer 1 · 2014-10-10T17:03:09.647

1

Sandboxing Python in general is impossible. You can try to prevent dangerous operations, which will mean significantly limiting what the student code can do. But that will likely leave open attack vectors anyway. A better option is to use OS-level sandboxing to isolate the Python process. The CodeJail library uses AppArmor to provide a safe Python eval, for example.

As an example of the difficulty of sandboxing Python, see Eval really is dangerous, or consider this input to your sandbox: 9**9**99, which will attempt to compute an integer on the order of a googolplex, consuming all of your RAM after a long time.

edited Oct 10 '14 at 17:03

answered Oct 10 '14 at 16:55

Ned Batchelder

364,293
75
561
662

Valid point! I wonder how those online evaluators do this. But I added a possible solution to my question, which overcomes this. Students code is only ran on computers of the students. – tmensink Oct 11 '14 at 15:48

score 0 · Answer 2 · answered Oct 10 '14 at 16:43

This is currently a very active field in programming languages research.

I know of these two different approaches that look at the problem: - http://arxiv.org/pdf/1409.0166.pdf - http://research.microsoft.com/en-us/um/people/sumitg/pubs/cacm14.pdf (this is actually only one of very many papers by Sumit and his group)

You may want to look at these things to find something that could help with your problem (and edit this answer to make it more useful).

Python evaluate and grade code from students

2 Answers2