2

After reading the Software Carpentry essay on Handling Configuration Files I'm interested in their Method #5: put parameters in a dynamically-loaded code module. Basically I want the power to do calculations within my input files to create my variables.

Based on this SO answer for how to import a string as a module I've written the following function to import a string or oen fileobject or STringIO as a module. I can then access varibales using the . operator:

import imp

def make_module_from_text(reader):
    """make a module from file,StringIO, text etc

    Parameters
    ----------
    reader : file_like object
        object to get text from

    Returns
    -------
    m: module
        text as module

    """
    #for making module out of strings/files see https://stackoverflow.com/a/7548190/2530083    

    mymodule = imp.new_module('mymodule') #may need to randomise the name; not sure
    exec reader in mymodule.__dict__    
    return mymodule

then

import textwrap
reader = textwrap.dedent("""\
    import numpy as np

    a = np.array([0,4,6,7], dtype=float)
    a_normalise = a/a[-1]    
    """)

mymod = make_module_from_text(reader)
print(mymod.a_normalise)

gives

[ 0.          0.57142857  0.85714286  1.        ]

All well and good so far, but having looked around it seems using python eval and exec introduces security holes if I don't trust the input. A common response is "Never use eval orexec; they are evil", but I really like the power and flexibility of executing the code. Using {'__builtins__': None} I don't think will work for me as I will want to import other modules (e.g. import numpy as np in my above code). A number of people (e.g. here) suggest using the ast module but I am not at all clear on how to use it(can ast be used with exec?). Is there simple ways to whitelist/allow specific functionality (e.g. here)? Is there simple ways to blacklist/disallow specific functionality? Is there a magic way to say execute this but don't do anythinh nasty.

Basically what are the options for making sure exec doesn't run any nasty malicious code?

EDIT:

My example above of normalising an array within my input/configuration file is perhaps a bit simplistic as to what computations I would want to perform within my input/configuration file (I could easily write a method/function in my program to do that). But say my program calculates a property at various times. The user needs to specify the times in some way. Should I only accept a list of explicit time values so the user has to do some calculations before preparing the input file? (note: even using a list as configuration variable is not trivial see here). I think that is very limiting. Should I allow start-end-step values and then use numpy.linspace within my program? I think that is limiting too; whatif I want to use numpy.logspace instead? What if I have some function that can accept a list of important time values and then nicely fills in other times to get well spaced time values. Wouldn't it be good for the user to be able to import that function and use it? What if I want to input a list of user defined objects? The thing is, I don't want to code for all these specific cases when the functinality of python is already there for me and my user to use. Once I accept that I do indead want the power and functionality of executing code in my input/configuration file I wonder if there is actually any difference, security wise, in using exec vs using importlib vs imp.load_source and so on. To me there is the limited standard configparser or the all powerful, all dangerous exec. I just wish there was some middle ground with which I could say 'execute this... without stuffing up my computer'.

Community
  • 1
  • 1
rtrwalker
  • 1,021
  • 6
  • 13
  • Sorry if this is a bit off topic. I skimmed the article you posted and I'm confused how you got from "ways to set configurable values in your program" to "executing random python strings"? They seem like really different concepts. Also, if you are the one writing the python, why would you worry about security? I haven't used `ast` but it seems like it was only used in that post because arithmetic expressions can be parsed trivially using an abstract syntax tree. It's not really designed to run arbitrary python (from a glance). – rliu Nov 01 '13 at 05:10
  • @roliu many scientific computer programs follow the scheme of 'input file'-->Calculations-->'output file'. Preparation of an input file can take considerable time. Take for example my code above, say my program only takes normalised values of `a` (i.e. I need to define `a_norm` somewhere). If I just have a static input file then I'll always have to normalise my raw `a` values outside my input file. With a dynamic input/configuration file I can do it all within input file keeping all my input data together. With a dynamic input/configuration file I am able to preprocess my data. – rtrwalker Nov 01 '13 at 05:46
  • @rtwalker That's a hard example for me to understand because I would just write a method that normalizes the raw values for me. Compare this to the config example in the article you linked. There he computes _configuration settings_ based on other _configuration settings_. Normalizing a matrix vs. setting a couple of flags are fundamentally different to me. The former is something I'd write a function for. The latter is something I'd do by hand (or in some _rare_ cases that I'd want computed... from other flags that I _set by hand_). – rliu Nov 01 '13 at 07:06
  • 1
    @rtwalker Sorry, long comment, part 2. The reason that this distinction matters is that the article's cleverness for setting config settings avoids your conundrum because it's only meant for config settings--which can't be malicious code (it's your own code!). But again, if _you_ are writing all of the input strings to your `exec` or `eval` then what's the issue? – rliu Nov 01 '13 at 07:07
  • There is certainly no problems using `exec` when I write the input file (assuming I trust myself :)). I suppose I'm trying to anticipate problems that might arise when others write input files. To me `exec`ing a file as a module creates a nice general way to access the variables in the input file, but still allowing the user to pre-process their input anywya they want within the file. Using one single file is useful because I don't have input data in multiple places. – rtrwalker Nov 01 '13 at 07:57
  • If all you want to do is dynamically import a module from a string, this can be done without using `exec` at all. Look into the import system documentation, `sys.metapath`, finders, loaders and the like. – l4mpi Nov 01 '13 at 10:44
  • I suppose I can fall back on the fact that anyone using my package, would be writing their own input/configuration files and running them on their own machine. Any malicious code introduced would then be self inflicted. If your running my code then you have python so you can already do nasty things to your own system. But by keeping `exec` in my code I guess I'm precluding any remote use of my code such as providing access via the web. In those cases code from someone else could be executed on my machine. While I don't envisge providing remote access cuurently. That could change later. – rtrwalker Nov 04 '13 at 01:22

2 Answers2

3

"Never use eval or exec; they are evil". This is the only answer that works here, I think. There is no fully safe way to use exec/eval on an untrusted string string or file.

The best you can do is to come up with your own language, and either interpret it yourself or turn it into safe Python code before handling it to exec. Be careful to proceed from the ground up --- if you allow the whole Python language minus specific things you thought of as dangerous, it would never be really safe.

For example, you can use the ast module if you want Python-like syntax; and then write a small custom ast interpreter that only recognizes a small subset of all possible nodes. That's the safest solution.

Armin Rigo
  • 12,048
  • 37
  • 48
0

If you are willing to use PyPy, then its sandboxing feature is specifically designed for running untrusted code, so it may be useful in your case. Note that there are some issues with CPython interoperability mentioned that you may need to check.

Additionally, there is a link on this page to an abandoned project called pysandbox, explaining the problems with sandboxing directly within python.

DaveP
  • 6,952
  • 1
  • 24
  • 37