5

I'm currently writing a parser to parse simple arithmetic formula: which only need (and restrict) to support +-*/ on number and variables. For example:

100.50*num*discount

It's basicly used to calculate price on products.

This is written in python and i would like to just use python's own parser for simplicity. The idea is firstly parse the input into ast, then walk on the ast to restrict the ast's node type in a small subset, say: ast.BinOp, ast.Add, ast.Num, ast.Name and so on...

Currently it works well, except that the float point number in the ast is not precise. So i want to transform the ast's ast.Num node into some ast.Call(func=ast.Name(id='Decimal'), ...). But the problem is: ast.Num only contains a n field that is the already parsed float point number. And it's not easy to get the original numeric literal in source code: How to get source corresponding to a Python AST node?

Is there any suggestion?

Community
  • 1
  • 1
jayven
  • 770
  • 8
  • 19
  • Can you explain that what you mean by *original number literal in source code*? – Mazdak Feb 15 '16 at 17:04
  • Sorry, should be *numeric literal* :https://docs.python.org/2/reference/lexical_analysis.html?highlight=literal#numeric-literals – jayven Feb 16 '16 at 00:31

1 Answers1

5

I'd suggest a two-step approach: in the first step, use Python's tokenize module to convert all floating-point numeric literals in the source into strings of the form 'Decimal(my_numeric_literal)'. Then you can work on the AST in the manner that you suggest.

There's even a recipe for the first step in the tokenize module documentation. To avoid a link-only answer, here's the code from that recipe (along with the necessary imports that the recipe itself is missing):

from cStringIO import StringIO
from tokenize import generate_tokens, untokenize, NAME, NUMBER, OP, STRING

def is_float_literal(s):
    """Identify floating-point literals amongst all numeric literals."""
    if s.endswith('j'):
        return False  # Exclude imaginary literals.
    elif '.' in s:
        return True  # It's got a '.' in it and it's not imaginary.
    elif s.startswith(('0x', '0X')):
        return False  # Must be a hexadecimal integer.
    else:
        return 'e' in s  # After excluding hex, 'e' must indicate an exponent.

def decistmt(s):
    """Substitute Decimals for floats in a string of statements.

    >>> from decimal import Decimal
    >>> s = 'print +21.3e-5*-.1234/81.7'
    >>> decistmt(s)
    "print +Decimal ('21.3e-5')*-Decimal ('.1234')/Decimal ('81.7')"

    >>> exec(s)
    -3.21716034272e-007
    >>> exec(decistmt(s))
    -3.217160342717258261933904529E-7

    """
    result = []
    g = generate_tokens(StringIO(s).readline)   # tokenize the string
    for toknum, tokval, _, _, _  in g:
        if toknum == NUMBER and is_float_literal(tokval):
            result.extend([
                (NAME, 'Decimal'),
                (OP, '('),
                (STRING, repr(tokval)),
                (OP, ')')
            ])
        else:
            result.append((toknum, tokval))
    return untokenize(result)

The original recipe identifies floating-point literals by checking for the existence of a '.' in the value. That's not entirely bullet-proof, since it excludes literals like '1e10', and includes imaginary literals like 1.0j (which you may want to exclude). I've replaced that check with my own version in is_float_literal above.

Trying this on your example string, I get this:

>>> expr = '100.50*num*discount'
>>> decistmt(expr)
"Decimal ('100.50')*num *discount "

... which you can now parse into an AST tree as before:

>>> tree = ast.parse(decistmt(expr), mode='eval')
>>> # walk the tree to validate, make changes, etc.
... 
>>> ast.dump(tree)
"Expression(body=BinOp(left=BinOp(left=Call(func=Name(id='Decimal', ...

and finally evaluate:

>>> from decimal import Decimal
>>> locals = {'Decimal': Decimal, 'num': 3, 'discount': Decimal('0.1')}
>>> eval(compile(tree, 'dummy.py', 'eval'), locals)
Decimal('30.150')
Mark Dickinson
  • 29,088
  • 9
  • 83
  • 120