5

I'm embedding Python 3.6 in my application, and I want to disable import command in the scripts to prevent users to import any python built-in libraries. I'd like to use only the language itself and my own C++ defined modules.

Py_SetProgramName (L"Example");
Py_Initialize ();
PyObject* mainModule = PyImport_AddModule ("__main__");
PyObject* globals = PyModule_GetDict (mainModule);

// This should work
std::string script1 = "print ('example')";
PyRun_String (script1.c_str (), Py_file_input, globals, nullptr);

// This should not work
std::string script2 = "import random\n"
                      "print (random.randint (1, 10))\n";
PyRun_String (script2.c_str (), Py_file_input, globals, nullptr);

Py_Finalize ();

Do you know any way to achieve this?

Community
  • 1
  • 1
kovacsv
  • 687
  • 4
  • 14
  • Just from a glimpse: remove `eval`, `exec`, `sys` and `os` too. – Rafael Barros Feb 26 '18 at 15:46
  • Of course I'd like to disable everything except my own module. – kovacsv Feb 26 '18 at 15:48
  • Note that you can control all the ``import`` statements by overwriting the ``builtins.__import__`` attribute with a custom function. (This doesn't make Python robust against malicious users, but works in your simple examples.) – Armin Rigo Mar 17 '18 at 13:16

1 Answers1

4

Python has a long history of being impossible to create a secure sandbox (see How can I sandbox Python in pure Python? as a starting point, then dive into an old python-dev discussion if you feel like it). Here are what I consider to be your best two options.

Pre-scan the code

Before executing anything, scan the code. You could do this in Python with the AST module and then walk the tree, or can likely get far enough with simpler text searches. This likely works in your scenario because you have restricted use cases - it doesn't generalize to truly arbitrary code.

What you are looking for in your case will be any import statements (easy), and any top-level variables (e.g., in a.b.c you care about a and likely a.b for a given a) that are not "approved". This will enable you to fail on any code that isn't clean before running it.

The challenge here is that even trivally obfuscated code will bypass your checks. For example, here are some ways to import modules given other modules or globals that a basic scan for import won't find. You would likely want to restrict direct access to __builtins__, globals, some/most/all names with __double_underscores__ and members of certain types. In an AST, these will unavoidably show up as top-level variable reads or attribute accesses.

getattr(__builtins__, '__imp'+'ort__')('other_module')

globals()['__imp'+'ort__']('other_module')

module.__loader__.__class__(
    "other_module",
    module.__loader__.path + '/../other_module.py'
).load_module()

(I hope it goes somewhat without saying, this is an impossible challenge, and why this approach to sandboxing has never fully succeeded. But it may be good enough, depending on your specific threat model.)

Runtime auditing

If you are in a position to compile your own Python runtime, you might consider using the (currently draft) PEP 551 hooks. (Disclaimer: I am the author of this PEP.) There are draft implementations against the latest 3.7 and 3.6 releases.

In essence, this would let you add hooks for a range of events within Python and determine how to respond. For example, you can listen to all import events and determine whether to allow or fail them at runtime based on exactly which module is being imported, or listen to compile events to manage all runtime compilation. You can do this from Python code (with sys.addaudithook) or C code (with PySys_AddAuditHook).

The Programs/spython.c file in the repo is a fairly thorough example of auditing from C, while doing it from Python looks more like this (taken from my talk about this PEP):

import sys

def prevent_bitly(event, args):
    if event == 'urllib.Request' and '://bit.ly/' in args[0]:
        print(f'WARNING: urlopen({args[0]}) blocked')
        raise RuntimeError('access to bit.ly is not allowed')

sys.addaudithook(prevent_bitly)

The downside of this approach is you need to build and distribute your own version of Python, rather than relying on a system install. However, in general this is a good idea if your application is dependent on embedding as it means you won't have to force users into a specific system configuration.

Zooba
  • 11,221
  • 3
  • 37
  • 40
  • 1
    Very detailed answer, thank you, I think this is a very good starting point. One question: Why do I have to check top-level variables? What kind of variables can cause trouble in this case? – kovacsv Feb 27 '18 at 18:12
  • @kovacsv My comment started getting long, so I added to the answer instead. – Zooba Feb 27 '18 at 20:48
  • Thank you very much, now I understand the complexity of the problem. – kovacsv Feb 28 '18 at 13:09