How do I sanitize a list comprehension given by a user?

Question

I am working on an interface for a simulator that is meant to be friendly to people who prefer the command line to a GUI. To give the simulator the levels, the user types the information into a file, which is then parsed and generates design points, which are then sent to a main server.

I would like to be able to implement some sort of "range" feature so that the user will not need to type out all of the individual levels. More power is needed than a simple additive sequence. Since the parser and related code is already in Python, this seems like a perfect use case for list comprehensions. However, the list comprehension is user input and not guaranteed to be valid. Using eval seems too dangerous, and literal_eval does not support list comprehensions.

My current goal is for something like this to be valid and safe:

{"Factor 1": [1,2,3,7,8],
"Factor 2": "[2**x for x in range(5,20) if (x % 3) == 0]"}

The base format for files that the user types is JSON. I am looking to extend the language to have additional features (like range) to fill various user needs. "Data set 1" can be parsed in the existing system. The list comprehension will be evaluated on the user's machine, so simple attacks like 'x'*9**999999**99999 are self-destructive.

It seems relatively easy to sanitize the range function using a regex, but I'm not sure how to make sure that the other parts are safe. Are regexes sufficient for this task, or is there another approach I should be following?

Your question as written is a little confusing - are you evaluating Python code on your local machine, and sending the result to a server? Or sending the code to the server, and letting the server do the heavy lifting? — Craig Otis, Oct 05 '14 at 21:55
Looks like python code, does the user enter python code? If so, it has to be parsed by a python parser. Generally, you wouldn't let user enter code to be evaled. — , Oct 05 '14 at 21:56
It's not clean, but if you expect simple list comprehensions to be insered by the user you can stop them if they take too much on execution: [link](http://stackoverflow.com/questions/492519/timeout-on-a-python-function-call) — Hrabal, Oct 06 '14 at 13:04
@sln The list comprehension itself is not Python code. I am using that structure because it is familiar to my users, and I was hoping that it would be easy to parse. — Will, Oct 06 '14 at 19:57
@CraigOtis The files that the user types specify factors and levels for a simulation. The client machine will do the work of generating design points, which will then be sent to the server. The simulation itself occurs on the server. — Will, Oct 06 '14 at 19:59
Unfortunately I don't know Python so cannot tell the difference between the list or the code. — , Oct 06 '14 at 20:12

score 0 · Accepted Answer · answered Oct 09 '14 at 15:52

Further analysis seems to show that eval is less dangerous than usual here. The parsing is all done client side, and the code of the project is open-source. Therefore, anything malicious that the user could do by exploiting an eval is either self-destructive or possible through less convoluted methods. Therefore, I can just use eval to generate the list of levels.

Of course, I will have to heavily document this decision due to the "eval is evil; kill it with fire" reaction that most people (including myself) have towards its use.

How do I sanitize a list comprehension given by a user?

1 Answers1