25

I am about to get a bunch of python scripts from an untrusted source.

I'd like to be sure that no part of the code can hurt my system, meaning:

(1) the code is not allowed to import ANY MODULE

(2) the code is not allowed to read or write any data, connect to the network etc

(the purpose of each script is to loop through a list, compute some data from input given to it and return the computed value)

before I execute such code, I'd like to have a script 'examine' it and make sure that there's nothing dangerous there that could hurt my system.

I thought of using the following approach: check that the word 'import' is not used (so we are guaranteed that no modules are imported)

yet, it would still be possible for the user (if desired) to write code to read/write files etc (say, using open).

Then here comes the question:

(1) where can I get a 'global' list of python methods (like open)?

(2) Is there some code that I could add to each script that is sent to me (at the top) that would make some 'global' methods invalid for that script (for example, any use of the keyword open would lead to an exception)?

I know that there are some solutions of python sandboxing. but please try to answer this question as I feel this is the more relevant approach for my needs.

EDIT: suppose that I make sure that no import is in the file, and that no possible hurtful methods (such as open, eval, etc) are in it. can I conclude that the file is SAFE? (can you think of any other 'dangerous' ways that built-in methods can be run?)

user3262424
  • 7,223
  • 16
  • 54
  • 84
  • 5
    "(1) where can I get a 'global' list of python methods (like open)?" Did you actually look at the Python documentation yet? That's already well-defined as the list of built-in functions. Why are you asking? – S.Lott Apr 01 '11 at 18:53
  • 9
    use a VM instead of running it on a sensitive system. – dting Apr 01 '11 at 18:55
  • 4
    http://wiki.python.org/moin/SandboxedPython – MK. Apr 01 '11 at 18:58
  • @Blender: thanks, this is what I am looking for. so, trash if there is 'import', `eval()`. anything else? how do I disable built-in functions? – user3262424 Apr 01 '11 at 18:58
  • In fact there is if I'm not wrong tricky ways to import modules other than keyword import – Xavier Combelle Apr 01 '11 at 18:59
  • 1
    @kriegar: Please post your answer as an answer so we can upvote it. Don't post answers as comments. – S.Lott Apr 01 '11 at 19:02
  • 1
    @user540009: Trashing functions with `eval()` will have false positives. Perfectly safe scripts use `eval()` perfectly safely. – S.Lott Apr 01 '11 at 19:04
  • "Whitelist, not blacklist." But sandboxing is likely much more viable. –  Apr 01 '11 at 19:06
  • You _can't_ do this from within Python. You need some external tool to confine/sandbox your code. – Allen Apr 01 '11 at 20:18
  • Related question: http://stackoverflow.com/questions/861864/is-there-a-safe-subset-of-python-for-use-as-an-embedded-scripting-language – intuited Apr 01 '11 at 22:07
  • @kriegar: Please post your answer as an answer so we can upvote it. Flamed or not flamed isn't really very interesting. I didn't ask to discuss the nuances of the site. I'm asking if you would please post your answer as an answer so that we can upvote it properly. – S.Lott Apr 02 '11 at 01:59
  • @kriegar: Also, please do not ever post "status" comments. "Sorry. Posted" doesn't help us since it's already obvious you actually posted an answer. It's best to simply delete the useless and uninformative status comment. And the useless discussion comment on "I'm pretty new to stackoverflow..." Neither pertain to this specific question. Deleting them is good because it makes my comments appear insane. – S.Lott Apr 02 '11 at 11:28

7 Answers7

25

This point hasn't been made yet, and should be:

You are not going to be able to secure arbitrary Python code.

A VM is the way to go unless you want security issues up the wazoo.

Katriel
  • 120,462
  • 19
  • 136
  • 170
  • If the code imports nothing except other code within the same codebase, and the code does not make use of any absurd built-in functions/keywords such as eval, it's pretty easy to see that the code does not hurt your system. If the code does do imports, you just have to address them one at a time, same for keywords. You can rewrite those parts to not use them, disable features you don't want, etc. A VM can be broken out of. – L̲̳o̲̳̳n̲̳̳g̲̳̳p̲̳o̲̳̳k̲̳̳e̲̳̳ Mar 31 '12 at 00:36
  • Hmm I wonder if traversing the object graph of only built in whitelisted objects (anything that isn't insane like eval and is not from an import outside of the code being audited) and your own defined objects can let you reach something that isn't whitelisted. – L̲̳o̲̳̳n̲̳̳g̲̳̳p̲̳o̲̳̳k̲̳̳e̲̳̳ Mar 31 '12 at 01:05
9

You can still obfuscate import without using eval:

s = '__imp'
s += 'ort__'
f = globals()['__builtins__'].__dict__[s]
** BOOM **
julx
  • 8,694
  • 6
  • 47
  • 86
  • In this case, couldn't you just do `del __builtins__.__import__` before running the script? – Ponkadoodle May 16 '11 at 01:14
  • I think `__import__` is what Python itself uses when it executes an `import` statement. It is sometimes visible in stack traces. So if you actually manage to remove it, where will be no way to import any module from anywhere. That's kind of radical... – julx Oct 07 '11 at 18:54
  • That's not true. You could store it in a local variable, then delete it, and then possibly restore it after you run the user-code. But yeah, still not a good "solution"... – Ponkadoodle Oct 08 '11 at 23:27
3

Built-in functions.

Keywords.

Note that you'll need to do things like look for both "file" and "open", as both can open files.

Also, as others have noted, this isn't 100% certain to stop someone determined to insert malacious code.

GreenMatt
  • 18,244
  • 7
  • 53
  • 79
3

An approach that should work better than string matching us to use module ast, parse the python code, do your whitelist filtering on the tree (e.g. allow only basic operations), then compile and run the tree.

See this nice example by Andrew Dalke on manipulating ASTs.

orip
  • 73,323
  • 21
  • 116
  • 148
2

built in functions/keywords:

  • eval
  • exec
  • __import__
  • open
  • file
  • input
  • execfile
  • print can be dangerous if you have one of those dumb shells that execute code on seeing certain output
  • stdin
  • __builtins__
  • globals() and locals() must be blocked otherwise they can be used to bypass your rules

There's probably tons of others that I didn't think about.

Unfortunately, crap like this is possible...

object().__reduce__()[0].__globals__["__builtins__"]["eval"]("open('/tmp/l0l0l0l0l0l0l','w').write('pwnd')")

So it turns out keywords, import restrictions, and in-scope by default symbols alone are not enough to cover, you need to verify the entire graph...

1

Use a Virtual Machine instead of running it on a system that you are concerned about.

S.Lott
  • 384,516
  • 81
  • 508
  • 779
dting
  • 38,604
  • 10
  • 95
  • 114
0

Without a sandboxed environment, it is impossible to prevent a Python file from doing harm to your system aside from not running it. It is easy to create a Cryptominer, delete/encrypt/overwrite files, run shell commands, and do general harm to your system.
If you are on Linux, you should be able to use docker to sandbox your code. For more information, see this GitHub issue: https://github.com/raxod502/python-in-a-box/issues/2.
I did come across this on GitHub, so something like it could be used, but that has a lot of limits.


Another approach would be to create another Python file which parses the original one, removes the bad code, and runs the file. However, that would still be hit-and-miss.