ValueError: malformed string when using ast.literal_eval

Question

It is widely known that using eval() is a potential security risk so the use of ast.literal_eval(node_or_string) is promoted

However In python 2.7 it returns ValueError: malformed string when running this example:

>>> ast.literal_eval("4 + 9")

Whereas in python 3.3 this example works as expected:

>>> ast.literal_eval('4+9')
13

Why does it run on python 3 and not python 2? How can I fix it in python 2.7 without using the risky eval() function?

That's definitely not a literal, so I would say Python 3.3 is in the wrong here. Perhaps it's an unintended consequence of peephole optimizations? Edit: Nope, it's special cased and constant folded in `literal_eval`. Only for addition and substraction though, no multiplication or division or other operators. — , Dec 23 '13 at 17:27
See also http://stackoverflow.com/questions/14611352/malformed-string-valueerror-ast-literal-eval-with-string-representation-of-tup — Antti Haapala -- Слава Україні, Jul 11 '16 at 13:59

poke · Accepted Answer · 2017-02-07T12:00:16.157

The reason this doesn’t work on Python 2 lies in its implementation of literal_eval. The original implementation only performed number evaluation for additions and subtractions when the righth operand was a complex number. This is syntactically necessary for complex numbers to be expressed as a literal.

This was changed in Python 3 so that it supports any kind of valid number expression to be on either side of the addition and subtraction. However, the use of literal_eval is still restricted to additions and subtractions.

This is mostly because literal_eval is supposed to be a function that turns a single constant literal (expressed as a string) into a Python object. Kind of like a backwards repr for simple built-in types. Actual expression evaluation is not included, and the fact that this works with Python 3 is just a nice-to-have side effect from its implementation.

In order to evaluate actual expressions, without having to use eval (which we don’t want to), we can write our own expression evaluation algorithm that operates on the AST. This is pretty simple, especially for simple arithmetic operations on numbers (for example to build your own calculator etc.). We simply parse the string into an AST and then evaluate the resulting tree by looking at the different node types and applying the correct operation.

Something like this:

import ast, operator

binOps = {
    ast.Add: operator.add,
    ast.Sub: operator.sub,
    ast.Mult: operator.mul,
    ast.Div: operator.div,
    ast.Mod: operator.mod
}

def arithmeticEval (s):
    node = ast.parse(s, mode='eval')

    def _eval(node):
        if isinstance(node, ast.Expression):
            return _eval(node.body)
        elif isinstance(node, ast.Str):
            return node.s
        elif isinstance(node, ast.Num):
            return node.n
        elif isinstance(node, ast.BinOp):
            return binOps[type(node.op)](_eval(node.left), _eval(node.right))
        else:
            raise Exception('Unsupported type {}'.format(node))

    return _eval(node.body)

As you can see, this implementation is pretty straightforward. Of course it does not support more complex stuff like exponentiation and some unary nodes yet, but it’s not too difficult to add that. And it works just fine:

>>> arithmeticEval('4+2')
6
>>> arithmeticEval('4*1+2*6/3')
8

You could even introduce more complex things later (for example function calls for things like sin()).

I think in python3, operator.div got deleted. You can use operator.truediv instead. — Keith, Aug 22 '18 at 21:35

score 6 · Answer 2 · edited May 23 '17 at 12:34

It's in order to support complex numbers (since issue 4907). For example, 1 + 2j is parsed by the parser as an expression consisting of an integer literal, an addition operation and an imaginary literal; but since complex numbers are a built-in type, it is desirable for ast.literal_eval to support complex number syntax.

The change in behaviour between 2.x and 3.x is to support writing the complex number the "wrong way round" e.g. 1j + 2; the fact that it allows arbitrary addition or subtraction expressions is a (mostly unintended) side effect.

If you want to parse arbitrary arithmetic expressions, you should parse to a syntax tree (using ast.parse), verify it with a whitelist, and then evaluate.

+1, thanks for the information about the complex numbers! Didn’t think of the addition being part of the syntax necessary to have them as literals. — poke, Dec 23 '13 at 19:02

wim · Answer 3 · 2013-12-23T17:53:13.820

3

Use the source, luke!

http://hg.python.org/cpython/file/2.7/Lib/ast.py#l40 http://hg.python.org/cpython/file/3.2/Lib/ast.py#l39

You will find your answer in there. Specifically, the 2.7 version has the weird restriction on line 70 that the right node of the BinOp is complex.

>>> sys.version
'2.7.3 (default, Sep 26 2013, 20:03:06) \n[GCC 4.6.3]'
>>> ast.literal_eval('9 + 0j')
(9 + 0j)
>>> ast.literal_eval('0j + 9')
ValueError: malformed string

I'm guessing that the intention of 2.7 was to allow literal_eval of complex literals for example numbers like 9 + 0j, and it was never intended to do simple integer additions. Then in python 3 they beefed up the literal_eval to handle these cases.

edited Dec 23 '13 at 17:53

answered Dec 23 '13 at 17:35

wim

338,267
99
616
750

2

The bizarre behaviour in 2.7 was added in [this revision](http://hg.python.org/cpython/rev/c776b6d814c2/) and it seems to have been intended to solve [issue 4907](http://bugs.python.org/issue4907). – Gareth Rees Dec 23 '13 at 17:47
1

what puzzles me most is the fact that why didn't the dev's implement python 3's `ast.literal_eval` into python 2? – K DawG Dec 23 '13 at 17:54
1

Because 2 + 3 is not a literal. What if I monkeypatch `+` to do something different? I agree with the 2.7 behaviour here. It seems to me like `+`/`-` working is just an ugly side effect of handling complex numbers. If adding is supported, why not dividing or multiplying or ... ? – wim Dec 23 '13 at 17:55
Indeed, this was eventually considered a bug in 3.6 and is now fixed in 3.7. [issue31778](https://bugs.python.org/issue31778) – wim Jun 22 '18 at 23:43

dawg · Answer 4 · 2013-12-24T03:03:27.693

It is not too hard to use pyparsing to cobble together a simple expression evaluator.

Suppose you want to eval expression, including parens, of the type of expressions of the following:

2+3
4.0^2+5*(2+3+4)
1.23+4.56-7.890
(1+2+3+4)/5
1e6^2/1e7

This simplification of the SimpleCalc example:

import pyparsing as pp
import re

ex='''\
2+3
4.0^2+5*(2+3+4)
1.23+4.56-7.890
(1+2+3+4)/5
1e6^2/1e7'''

e = pp.CaselessLiteral('E')
dec, plus, minus, mult, div, expop=map(pp.Literal,'.+-*/^')
addop  = plus | minus
multop = mult | div
lpar, rpar=map(pp.Suppress,'()')
p_m = plus | minus

num = pp.Word(pp.nums) 
integer = pp.Combine( pp.Optional(p_m) + num )
floatnumber = pp.Combine( integer +
                       pp.Optional( dec + pp.Optional(num) ) +
                       pp.Optional( e + integer ) )

stack=[]
def pushFirst(s, l, t):
    stack.append( t[0] )

expr=pp.Forward()
atom = ((floatnumber | integer).setParseAction(pushFirst) | 
         ( lpar + expr.suppress() + rpar )
       )

factor = pp.Forward()
factor << atom + pp.ZeroOrMore( ( expop + factor ).setParseAction( pushFirst ) )

term = factor + pp.ZeroOrMore( ( multop + factor ).setParseAction( pushFirst ) )
expr << term + pp.ZeroOrMore( ( addop + term ).setParseAction( pushFirst ) )    

pattern=expr+pp.StringEnd()

opn = { "+" : ( lambda a,b: a + b ),
        "-" : ( lambda a,b: a - b ),
        "*" : ( lambda a,b: a * b ),
        "/" : ( lambda a,b: a / b ),
        "^" : ( lambda a,b: a ** b ) }

def evaluateStack(stk):
    op = stk.pop()
    if op in "+-*/^":
        op2 = evaluateStack(stk)
        op1 = evaluateStack(stk)
        return opn[op](op1, op2)
    elif re.search('^[-+]?[0-9]+$',op):
        return int(op)
    else:
        return float(op)     

for line in ex.splitlines():
    parse=pattern.parseString(line)   
    s=stack[:]
    print('"{}"->{} = {}'.format(line,s,evaluateStack(stack)))

Prints:

"2+3"->['2', '3', '+'] = 5
"4.0^2+5*(2+3+4)"->['4.0', '2', '^', '5', '2', '3', '+', '4', '+', '*', '+'] = 61.0
"1.23+4.56-7.890"->['1.23', '4.56', '+', '7.890', '-'] = -2.1000000000000005
"(1+2+3+4)/5"->['1', '2', '+', '3', '+', '4', '+', '5', '/'] = 2.0
"1e6^2/1e7"->['1E6', '2', '^', '1E7', '/'] = 100000.0

+1 Interesting solution... whats the advantage of using `pyparser` is it extra secure or faster? — K DawG, Dec 24 '13 at 06:43
The main advantage is more control of input and output. Using a parser is as secure as using AST literal_eval. It will validate you input follows what you expect and will only execute what you tell it to. It is a great tool. AST is a great tool too. — dawg, Dec 24 '13 at 14:35

Buz · Answer 5 · 2019-10-15T08:29:10.023

An updated version of the answer from @poke that allows negative numbers in py3.x or other unary operators. So "-3" evaluates to -3 for example, rather than an error.

import ast, operator

binOps = {
ast.Add: operator.add,
ast.Sub: operator.sub,
ast.Mult: operator.mul,
ast.Div: operator.truediv,
ast.Mod: operator.mod
}

unOps = {
ast.USub: operator.neg
}

node = ast.parse(s, mode='eval')

def arithmetic_eval(s):
    binOps = {
    ast.Add: operator.add,
    ast.Sub: operator.sub,
    ast.Mult: operator.mul,
    ast.Div: operator.truediv,
    ast.Mod: operator.mod
    }

    unOps = {
    ast.USub: operator.neg
    }

    node = ast.parse(s, mode='eval')

    def _eval(node):
        if isinstance(node, ast.Expression):
            return _eval(node.body)
        elif isinstance(node, ast.Str):
            return node.s
        elif isinstance(node, ast.Num):
            return node.n
        elif isinstance(node, ast.BinOp):
            return binOps[type(node.op)](_eval(node.left), _eval(node.right))
        elif isinstance(node, ast.UnaryOp):
            return unOps[type(node.op)](_eval(node.operand))
        else:
            raise Exception('Unsupported type {}'.format(node))

    return _eval(node.body)

ValueError: malformed string when using ast.literal_eval

5 Answers5

Linked