0

Input:

from sympy.parsing.sympy_parser import parse_expr

print(parse_expr('1e10', evaluate=False))

Output:

10000000000.0000

It expands the scientific notation, no matter how big the exponent is. However, I expect a straight 1e10.

Even

from sympy import evaluate
with evaluate(False):
    print(parse_expr('1e10', evaluate=False))

returns the same output.

I use parse_expr to parse user-friendly expressions like 2 x + 3^2 to 2*x + 3**2, using standard_transformations, implicit_multiplication, convert_xor. The output of the parsing is fed to python to evaluate, at the same time printing on the screen for the user's reference. The expansion of 1e10 to 10000000000.0000 is quite annoying. How to prevent that?

Jordan He
  • 180
  • 2
  • 8
  • Thanks, but that didn't work, because it would recognize `e10` as a variable but it is not defined. – Jordan He Jun 22 '20 at 05:12
  • I wrote a calculator. When I input, e.g. `1e10 m/s^2`, it uses sympy parse_expr to parse it as `1e10*m/s**2` so that python can evaluate it. However, `parse_expr('1e10', evaluate=False)` (with some additional transformations) would parse it as `10000000000.0000*m/s**2`, which is equivalent, but when showed on the screen for the user to false check it becomes unnecessarily long. – Jordan He Jun 22 '20 at 05:20
  • Here's the calculator I wrote where I came across this problem: https://github.com/chongchonghe/acap . Type in `1e10 m`, python input shows `10000000000.0*m`, but I want either `1e10 m` or `1*10^10 m`. – Jordan He Jun 22 '20 at 05:24
  • "treating `1e10` as a number instead of an implicit multiplication" seems to explain why this occurs. Probably there's nothing I can do with it. – Jordan He Jun 22 '20 at 05:29

1 Answers1

1

Based of the discussion, it wasn't the parsing that was the issue but rather the output is the object of the concern. As noted, it is not possible to modify the default method of representation, but that doesn't exclude the possibility of turning the specific token into a representation to what your application's users might want to see. Reading the documentation, there is a printing module that will provide the level of control necessary.

The following is the most basic implementation to control how Floats are rendered, by extending the provided StrPrinter:

from sympy.printing.str import StrPrinter

class CustomStrPrinter(StrPrinter):
    def _print_Float(self, expr):         
        return '{:.2e}'.format(expr)

Now with the ability to control that output, give it an equation

>>> stmt = parse_expr('2*x + 3**2 + 1e10', evaluate=False)
>>> CustomStrPrinter().doprint(stmt)
'2*x + 3**2 + 1.00e+10'

Note that the implementation I provided restrict the output to two digits - you will need to modify that for your specific use case, for that this thread may give you some pointers.

Also, consulting the parsing transformation section (and the relevant code) may be of use.


Naturally, looking much deeper into how sympy actually deals with parsing (this is actually my first time looking in this library; I had zero prior experience before my current attempt at answering this question), I noted that the transformations was something that might be of use.

Looking at how parse_expr is implemented, I noted that it calls stringify_expr, and given that the transforms are actually applied at that location it indicated to me that comparing the list of tokens before and after the transform step would be a good place to attack.

With that, fire up the debugger (some output are truncated)


>>> from sympy.parsing.sympy_parser import parse_expr, standard_transformations
>>> import pdb
>>> pdb.run("parse_expr('1e10', transformations=standard_transformations)")
> <string>(1)<module>()
(Pdb) s
--Call--
> /tmp/env/lib/python3.7/site-packages/sympy/parsing/sympy_parser.py(908)parse_expr()
(Pdb) b 890
(Pdb) c
-> for transform in transformations:
(Pdb) pp tokens
[(2, '1e10'), (0, '')]
(Pdb) clear
(Pdb) b 893
(Pdb) c
> /tmp/env/lib/python3.7/site-packages/sympy/parsing/sympy_parser.py(893)stringify_expr()
-> return untokenize(tokens)
(Pdb) pp tokens
[(1, 'Float'), (53, '('), (2, "'1e10'"), (53, ')'), (0, '')]

(as an aside, those tokens will be untokenized back into Python source before sympy "compiles" that)

Naturally, a simple float will be transformed to a sympy.core.numbers.Float, and reading that documentation show that precision can be controlled, which indicated to me that may have an effect on the output, let's try that:

>>> from sympy.core.numbers import Float
>>> Float('1e10', '1')
1.e+10

So far so good, that looks like exactly how you wanted. However, the default auto_number implementation does not offer anything to toggle how the Float nodes are constructed, but given that transformations are simply functions and one can be implemented based on the characteristics of that output after auto_number transform was applied. The following could be done:

from ast import literal_eval
from tokenize import NUMBER, NAME, OP


def restrict_e_notation_precision(tokens, local_dict, global_dict):
    """
    Restrict input e notation precision to the minimum required.

    Should be used after auto_number transformation, as it depends on
    that transform to add the ``Float`` functional call for this to have
    an effect.
    """

    result = []
    float_call = False

    for toknum, tokval in tokens:
        if toknum == NAME and tokval == 'Float':
            # set the flag
            float_call = True

        if float_call and toknum == NUMBER and ('e' in tokval or 'E' in tokval):
            # recover original number before auto_number transformation
            number = literal_eval(tokval)
            # split the significand from base, while dropping the
            # decimal point before treating length as the precision.
            precision = len(number.lower().split('e')[0].replace('.', ''))
            result.extend([(NUMBER, repr(str(number))), (OP, ','),
                (NUMBER, repr(str(precision)))])
            float_call = False
        else:
            result.append((toknum, tokval))

    return result

Now follow the documentation on using parse_expr with custom transformations like the following:

>>> from sympy.parsing.sympy_parser import parse_expr, standard_transformations
>>> transforms = standard_transformations + (restrict_e_notation_precision,)
>>> stmt = parse_expr('2.412*x**2 + 1.14e-5 + 1e10', evaluate=False, transformations=transforms)
>>> print(stmt)
2.412*x**2 + 1.14e-5 + 1.0e+10

That looks exactly like what you wanted. However, there are still some cases that you will need the custom printer to address, such as the following:

>>> stmt = parse_expr('3.21e2*x + 1.3e-3', evaluate=False, transformations=transforms)
>>> stmt
321.0*x + 0.0013

This is again due to limitations imposed by the default repr for float types in Python. Using this version of the CustomStrPrinter that checks for and make use of the precision that was specified in our custom transform here will get around this issue:

from sympy.printing.str import StrPrinter

class FloatPrecStrPrinter(StrPrinter):
    """
    A printer that checks for custom precision and output concisely.
    """

    def _print_Float(self, expr):
        if expr._prec != 53:  # not using default precision
            return ('{:.%de}' % ((int(expr._prec) - 5) // 3)).format(expr)
        return super()._print_Float(expr) 

Demo:

>>> from sympy.parsing.sympy_parser import parse_expr, standard_transformations
>>> transforms = standard_transformations + (restrict_e_notation_precision,)
>>> stmt = parse_expr('3.21e2*x + 1.3e-3 + 2.7', evaluate=False, transformations=transforms)
>>> FloatPrecStrPrinter().doprint(stmt)
'3.21e+2*x + 1.3e-3 + 2.7'

One thing of note is that my solution above exploits the various peculiarities of floats - if the input expression was used in calculations it will likely not give any correct output, and that the numbers and any equations used as mostly as a rough guide and I didn't probe for any edge cases with everything - I will leave that up to you as this already took a good chunk of my time to dive into and to write this whole thing up - e.g. you may wish to use this for display to users, but keep the other, unadulterated expressions (i.e. without this additional transform) internally for doing arithmetic with.

Anyway, going forward, it definitely is useful to take note of the implementation details of the library being depended upon and don't be afraid to use the debugger to figure out what really is happening under the hood and make full use of the hooks that library makers provide to their users for them to extend upon.

metatoaster
  • 17,419
  • 5
  • 55
  • 66
  • Beautify as this answer is, it brings other inconvenience. For instance, `2.3 * x` would become `2.30e+0*x` since all floats are displayed in scientific notation. My ultimate goal, in asking this question, is to find a way to let `parse_expr` keep `1e10` as is, since pure python understand that, while doing all other parsing. – Jordan He Jun 22 '20 at 17:45
  • 1
    @JordanHe Amended the answer to fully address your issues. Also regarding your comment: I did note clearly that **"you will need to modify that for your specific use case"**, though my additional answer now also has a modified form to that. You will need to put in the further work to build your desired solution on top of what research you and others have published, as I can only assume so much. – metatoaster Jun 23 '20 at 04:36
  • Many thanks! Your solution is fully functional. I would definitely implement the first part of your amended answers into my code, both for real calculation and user display. The top priority is to make sure the calculation result matches the user display. As for the second part, since it's only the format of printing floats, it's safe to apply only to user display. Besides, the problem with '3.21e2' is not a big deal because as you increase the exponent to 3 (>= the precision) the e notation will remain. Users won't type in high-precision numbers anyway. – Jordan He Jun 23 '20 at 18:41