0

I'm writing a convenience library and looking for best practice to load lambda functions from a text file.

  • The library is designed to import one or more datasets (Facebook Insights, for example) in a known format and manipulate the data into a Pandas dataframe that can then be plotted either in an IPython notebook or a webpage.
  • Each definition contains functions (for aggregation in DataFrame.groupby() and also lambda functions that are used with DataFrame.apply())
  • I've currently hard-coded the rules to manipulate each file into a dict but I'd like to abstract these into a series of json files so that I can more easily add definitions.

For the aggregation methods, the list is fairly short so I can easily make a list of if statements. However, by definition the apply lambda functions are bespoke for each definition. Here's an example which takes a couple of columns to derive a percentage:

lambda x: float(float(x[1]) / float(x[0])) * 100}

I'm aware of the eval method but this doesn't sound like good practice (as I'd one day like to open this up for others to use and eval is open to abuse). Similar is the jsonpickle library but this is also open to abuse in principle. The alternative would be fixed list of functions, but I don't see that this type of arbitrary function can be made into a fixed list.

Has anyone got similar experience and able to offer a best practice approach?

Phil Sheard
  • 2,102
  • 1
  • 17
  • 38
  • Can you give an example of the json that would be parsed by your library? – Phillip Cloud May 03 '14 at 14:52
  • @PhillipCloud sure - at the moment the `lambda` example above is stored against a dictionary key to be applied against the dataframe. I'd planned to then store that out to json. The challenge is how to bring that back into Python without having to create a new `if` option for each successive item; I'd like to be able to define them on the fly within json files. – Phil Sheard May 03 '14 at 15:12
  • do you need arbitrary but pure mathematical functions? – jfs May 05 '14 at 10:47

1 Answers1

4

While it's true that eval can be a security hole, it's possible to restrict what is available to it by modifying the globals:

>>> f = eval('lambda x: float(x)', {'__builtins__': None})
>>> f('1.1')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1, in <lambda>
NameError: global name 'float' is not defined

Instead, pass in a dictionary containing only the functions you want exposed to those defining the functions:

safe_builtins = dict(
    __builtins__ = None,
    float = float,
    sum = sum,
    custom_func = ...
)

loaded_func = eval("lambda x: ...", safe_builtins)
Matthew Trevor
  • 14,354
  • 6
  • 37
  • 50
  • Thanks Matthew. I'll go with the above approach. I've done some further reading and there are other ways to exploit `eval` even with the whitelist but I think I'm prematurely optimising if I choose not to use it. It's for internal use right now and only I will be submitting new definitions. – Phil Sheard May 04 '14 at 10:49
  • 2
    @PhilSheard: it is easy to [break out even if `__builtins__ is None`](http://stackoverflow.com/a/9558001/4279) – jfs May 05 '14 at 10:45