44

We all know that eval is dangerous, even if you hide dangerous functions, because you can use Python's introspection features to dig down into things and re-extract them. For example, even if you delete __builtins__, you can retrieve them with

[c for c in ().__class__.__base__.__subclasses__()  
 if c.__name__ == 'catch_warnings'][0]()._module.__builtins__

However, every example I've seen of this uses attribute access. What if I disable all builtins, and disable attribute access (by tokenizing the input with a Python tokenizer and rejecting it if it has an attribute access token)?

And before you ask, no, for my use-case, I do not need either of these, so it isn't too crippling.

What I'm trying to do is make SymPy's sympify function more safe. Currently it tokenizes the input, does some transformations on it, and evals it in a namespace. But it's unsafe because it allows attribute access (even though it really doesn't need it).

user
  • 5,370
  • 8
  • 47
  • 75
asmeurer
  • 86,894
  • 26
  • 169
  • 240
  • 9
    That depends on what you mean by dangerous... I imagine that an attacker could create an expression to make a _really_ big integer that would cause you to run out of memory.... – mgilson Mar 04 '16 at 19:58
  • 3
    @mgilson that's a valid point. I suppose it's possible to protect against this by putting memory/time guards on your application, but definitely worth being aware of. – asmeurer Mar 04 '16 at 20:00
  • 9
    I think this also depends on the locals that you pass in... `a + b` is only as safe as `a.__add__` and `b.__radd__` are safe... – mgilson Mar 04 '16 at 20:06
  • 6
    is `ast.literal_eval` a possibility or do you need more than that but still not attributes? What about calls? – Jason S Mar 04 '16 at 20:10
  • 1
    @mgilson worth adding that as an answer. – asmeurer Mar 04 '16 at 20:18
  • 6
    related: http://blog.delroth.net/2013/03/escaping-a-python-sandbox-ndh-2013-quals-writeup/ – wim Mar 04 '16 at 20:25
  • An other related question: http://stackoverflow.com/q/13066594/510937 – Bakuriu Mar 04 '16 at 21:41
  • 1
    [Kinda relevant](http://codegolf.stackexchange.com/questions/61115/make-your-language-unusable#comment147151_61125) – Loovjo Mar 04 '16 at 23:52

6 Answers6

35

I'm going to mention one of the new features of Python 3.6 - f-strings.

They can evaluate expressions,

>>> eval('f"{().__class__.__base__}"', {'__builtins__': None}, {})
"<class 'object'>"

but the attribute access won't be detected by Python's tokenizer:

0,0-0,0:            ENCODING       'utf-8'        
1,0-1,1:            ERRORTOKEN     "'"            
1,1-1,27:           STRING         'f"{().__class__.__base__}"'
2,0-2,0:            ENDMARKER      '' 
vaultah
  • 44,105
  • 12
  • 114
  • 143
  • 1
    Well, you simply have to consider the contents of all f-strings and check them (or more safely: disallow them). – Bakuriu Mar 04 '16 at 21:40
  • 30
    This really highlights how much of a moving target trying to secure `eval` is. Right now, it's f-strings. Who knows what 3.7 will bring? – user2357112 Mar 04 '16 at 22:51
  • 1
    While it doesn't show up with the stdlib's `tokenize` module, the expression inside an f-string will show up in the AST when parsing `f"{some code}"`. – Arminius Apr 03 '18 at 23:11
  • 1
    The expression in an f-string doesn't have to contain any attribute access nodes for the f-string to do attribute access - e.g. `f"{eval('()' + chr(46) + '__class__')}"`. – kaya3 Jan 13 '20 at 17:49
22

It is possible to construct a return value from eval that would throw an exception outside eval if you tried to print, log, repr, anything:

eval('''((lambda f: (lambda x: x(x))(lambda y: f(lambda *args: y(y)(*args))))
        (lambda f: lambda n: (1,(1,(1,(1,f(n-1))))) if n else 1)(300))''')

This creates a nested tuple of form (1,(1,(1,(1...; that value cannot be printed (on Python 3), stred or repred; all attempts to debug it would lead to

RuntimeError: maximum recursion depth exceeded while getting the repr of a tuple

pprint and saferepr fails too:

...
  File "/usr/lib/python3.4/pprint.py", line 390, in _safe_repr
    orepr, oreadable, orecur = _safe_repr(o, context, maxlevels, level)
  File "/usr/lib/python3.4/pprint.py", line 340, in _safe_repr
    if issubclass(typ, dict) and r is dict.__repr__:
RuntimeError: maximum recursion depth exceeded while calling a Python object

Thus there is no safe built-in function to stringify this: the following helper could be of use:

def excsafe_repr(obj):
    try:
        return repr(obj)
    except:
        return object.__repr__(obj).replace('>', ' [exception raised]>')

And then there is the problem that print in Python 2 does not actually use str/repr, so you do not have any safety due to lack of recursion checks. That is, take the return value of the lambda monster above, and you cannot str, repr it, but ordinary print (not print_function!) prints it nicely. However, you can exploit this to generate a SIGSEGV on Python 2 if you know it will be printed using the print statement:

print eval('(lambda i: [i for i in ((i, 1) for j in range(1000000))][-1])(1)')

crashes Python 2 with SIGSEGV. This is WONTFIX in the bug tracker. Thus never use print-the-statement if you want to be safe. from __future__ import print_function!


This is not a crash, but

eval('(1,' * 100 + ')' * 100)

when run, outputs

s_push: parser stack overflow
Traceback (most recent call last):
  File "yyy.py", line 1, in <module>
    eval('(1,' * 100 + ')' * 100)
MemoryError

The MemoryError can be caught, is a subclass of Exception. The parser has some really conservative limits to avoid crashes from stackoverflows (pun intended). However, s_push: parser stack overflow is output to stderr by C code, and cannot be suppressed.


And just yesterday I asked why doesn't Python 3.4 be fixed for a crash from,

% python3  
Python 3.4.3 (default, Mar 26 2015, 22:03:40) 
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> class A:
...     def f(self):
...         nonlocal __x
... 
[4]    19173 segmentation fault (core dumped)  python3

and Serhiy Storchaka's answer confirmed that Python core devs do not consider SIGSEGV on seemingly well-formed code a security issue:

Only security fixes are accepted for 3.4.

Thus it can be concluded that it can never be considered safe to execute any code from 3rd party in Python, sanitized or not.

And Nick Coghlan then added:

And as some additional background as to why segmentation faults provoked by Python code aren't currently considered a security bug: since CPython doesn't include a security sandbox, we're already relying entirely on the OS to provide process isolation. That OS level security boundary isn't affected by whether the code is running "normally", or in a modified state following a deliberately triggered segmentation fault.

Community
  • 1
  • 1
  • "*Thus there is no safe way to dump this value into logs, or anything - any attempt would lead to further exceptions being thrown.*" Bug-worthy? – cat Mar 05 '16 at 01:23
  • 3
    See, Haskell doesn't have that issue :-D Even the weirdest of stuff either bottoms out and can be easily caught or stringifies to an ordinary infinitely long string, which you can print an arbitrarily long part of. – John Dvorak Mar 05 '16 at 09:25
  • The first one can be achieved in 3.6 with `f-strings`, no eval needed. – cat Mar 05 '16 at 19:11
  • @AnttiHaapala The accepted answer demonstrates that's exactly the purpose of f-strings. – cat Mar 10 '16 at 02:32
  • By the way, the `nonlocal __x` thing was fixed by 3.6.0a0, though it's oddly still present in the packaged (`apt`) version of 3.5, despite the fix being 2 months old – cat Mar 10 '16 at 02:34
  • @tac because it is not considered a security fix by the packagers either. – Antti Haapala -- Слава Україні Mar 10 '16 at 07:34
  • Not related to the question, but in Chrome, if I mouse over your first "eval", then if my cursor is anywhere on the first line after the opening parenthesis (, my mouse cursor turns into an arrow instead of the normal cursor that your mouse has that kind of looks like an uppercase I (with serifs). Peculiar. – acdr Oct 31 '19 at 15:47
14

Users can still DoS you by inputting an expression that evaluates to a huge number, which would fill your memory and crash the Python process, for example

'10**10**100'

I am definitely still curious if more traditional attacks, like recovering builtins or creating a segfault, are possible here.

EDIT:

It turns out, even Python's parser has this issue.

lambda: 10**10**100

will hang, because it tries to precompute the constant.

asmeurer
  • 86,894
  • 26
  • 169
  • 240
  • The only way to avoid this is to use a timeout that blocks the execution of the thread that is running that after x time or when too many allocations are performed (which could be pretty hard to do...) – Bakuriu Mar 04 '16 at 21:50
  • 1
    @Bakuriu: If you're working within Python, that's going to be a lot harder because this is likely to be evaluated while holding the GIL. For a number that big, there's also a nonzero chance of OOMing, depending on the circumstances. – Kevin Mar 05 '16 at 05:24
9

Here is a safe_eval example which will ensure that the evaluated expression do not contain unsafe tokens. It does not try to take the literal_eval approach of interpreting the AST but rather whitelist the token types and use the real eval if expression passed test.

# license: MIT (C) tardyp
import ast


def safe_eval(expr, variables):
    """
    Safely evaluate a a string containing a Python
    expression.  The string or node provided may only consist of the following
    Python literal structures: strings, numbers, tuples, lists, dicts, booleans,
    and None. safe operators are allowed (and, or, ==, !=, not, +, -, ^, %, in, is)
    """
    _safe_names = {'None': None, 'True': True, 'False': False}
    _safe_nodes = [
        'Add', 'And', 'BinOp', 'BitAnd', 'BitOr', 'BitXor', 'BoolOp',
        'Compare', 'Dict', 'Eq', 'Expr', 'Expression', 'For',
        'Gt', 'GtE', 'Is', 'In', 'IsNot', 'LShift', 'List',
        'Load', 'Lt', 'LtE', 'Mod', 'Name', 'Not', 'NotEq', 'NotIn',
        'Num', 'Or', 'RShift', 'Set', 'Slice', 'Str', 'Sub',
        'Tuple', 'UAdd', 'USub', 'UnaryOp', 'boolop', 'cmpop',
        'expr', 'expr_context', 'operator', 'slice', 'unaryop']
    node = ast.parse(expr, mode='eval')
    for subnode in ast.walk(node):
        subnode_name = type(subnode).__name__
        if isinstance(subnode, ast.Name):
            if subnode.id not in _safe_names and subnode.id not in variables:
                raise ValueError("Unsafe expression {}. contains {}".format(expr, subnode.id))
        if subnode_name not in _safe_nodes:
            raise ValueError("Unsafe expression {}. contains {}".format(expr, subnode_name))

    return eval(expr, variables)



class SafeEvalTests(unittest.TestCase):

    def test_basic(self):
        self.assertEqual(safe_eval("1", {}), 1)

    def test_local(self):
        self.assertEqual(safe_eval("a", {'a': 2}), 2)

    def test_local_bool(self):
        self.assertEqual(safe_eval("a==2", {'a': 2}), True)

    def test_lambda(self):
        self.assertRaises(ValueError, safe_eval, "lambda : None", {'a': 2})

    def test_bad_name(self):
        self.assertRaises(ValueError, safe_eval, "a == None2", {'a': 2})

    def test_attr(self):
        self.assertRaises(ValueError, safe_eval, "a.__dict__", {'a': 2})

    def test_eval(self):
        self.assertRaises(ValueError, safe_eval, "eval('os.exit()')", {})

    def test_exec(self):
        self.assertRaises(SyntaxError, safe_eval, "exec 'import os'", {})

    def test_multiply(self):
        self.assertRaises(ValueError, safe_eval, "'s' * 3", {})

    def test_power(self):
        self.assertRaises(ValueError, safe_eval, "3 ** 3", {})

    def test_comprehensions(self):
        self.assertRaises(ValueError, safe_eval, "[i for i in [1,2]]", {'i': 1})
tardyp
  • 1,142
  • 11
  • 9
8

I don't believe Python is designed to have any security against untrusted code. Here's an easy way to induce a segfault via stack overflow (on the C stack) in the official Python 2 interpreter:

eval('()' * 98765)

From my answer to the "Shortest code that returns SIGSEGV" Code Golf question.

Community
  • 1
  • 1
feersum
  • 658
  • 4
  • 11
1

Controlling the locals and globals dictionaries is extremely important. Otherwise, someone could just pass in eval or exec, and call it recursively

safe_eval('''e("""[c for c in ().__class__.__base__.__subclasses__() 
    if c.__name__ == \'catch_warnings\'][0]()._module.__builtins__""")''', 
    globals={'e': eval})

The expression in the recursive eval is just a string.

You also need to set the eval and exec names in the global namespace to something that isn't the real eval or exec. The global namespace is important. If you use a local namespace, anything that creates a separate namespace, such as comprehensions and lambdas, will work around it

safe_eval('''[eval("""[c for c in ().__class__.__base__.__subclasses__()
    if c.__name__ == \'catch_warnings\'][0]()._module.__builtins__""") for i in [1]][0]''', locals={'eval': None})

safe_eval('''(lambda: eval("""[c for c in ().__class__.__base__.__subclasses__()
    if c.__name__ == \'catch_warnings\'][0]()._module.__builtins__"""))()''',
    locals={'eval': None})

Again, here, safe_eval only sees a string and a function call, not attribute accesses.

You also need to clear out the safe_eval function itself, if it has a flag to disable safe parsing. Otherwise you could simply do

safe_eval('safe_eval("<dangerous code>", safe=False)')
asmeurer
  • 86,894
  • 26
  • 169
  • 240