For a class, I would like to automatically evaluate (parts) of the coding assignments of students. The setup I had in mind is something like:
- Students get a class skeleton, which they have to fill in.
- Students ``upload'' this class definition to a server (or via webinterface)
- The server runs a script an test on specific functions, eg class.sigmoid(x), and checks if the output of the function is correct and might give suggestions.
This setup brings a whole lot of problems, since you're evaluating untrusted code. However, it would be extremely useful, for many of my classes, so I'm willing to spend some time in thinking it trough. I remember Coursera had something similar for matlab/octace assignments, but I can't get the details of that.
I've looked at many online python interfaces (eg, codecademy.com, ideone.com, c9.io); while they seem perfect to learn and or share code, with online evaluation. I do miss the option, that the evaluation script is "hidden" from the students (ie the evaluation script should contain a correct reference implementation to compare output on random generated data). Moreover, the course I give requires some data (eg images) and packages (sklearn / numpy), which is not always available.
Specifically, my questions are
- Do I miss an online environment which actually offers such a functionality. (that would be easiest)
- To set this up myself, I was thinking to host it at (eg) amazon cloud (so no problem with infrastructure at University), but are there any python practices you could recommend on sandboxing the evaluation?
Thanks in advance for any suggestions!
Pity to hear that the question is not suitable for StackOverflow. Thanks to the people (partially) answering the question.
After some more feedback via other channels, I think my approach will become as follows:
- Student gets skeleton and fills it in
- Student also has the evaluation script.
- In the script, some connections with a server are made to
- login
- obtain some random data
- check if the output of the students code is numerically identical to what the server expects.
In this way the students code is evaluated locally, but only output is send to the server. This limits the kind of evaluations possible, but still allows for kind of automatic evaluation of code.