2

I want to code a unit converter and I need to extract the given value from the unit in the input string.

To provide a user friendly experience while using the converter I want the user to be able to input the value and the unit in the same string. My problem is that I want to extract the numbers and the letters so that I can tell the program the unit and the value and store them in two different variables. For extracting the letters, I used the in operator, and that works properly. I also found a solution for getting the numbers from the input, but that doesn't work for values with exponents.

a = str(input("Type in your wavelength: "))
if "mm" in a:
    print("Unit = Millimeter")

b = float(a.split()[0])

Storing simple inputs like 567 mm as a float in b works but I want to be able to extract inputs like 5*10**6 mm but it says

could not convert string to float: '5*10**6'.

So what can I use to extract more complex numbers like this into a float?

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
Mika R.
  • 53
  • 5
  • 4
    [Complex numbers](https://en.wikipedia.org/wiki/Complex_number) are a very specific thing you're probably not interested in here. Do I understand it right that you want to be able to evaluate arbitrary arithmetic operations? Like `100 + 16 - 34 mm`? – Norrius Apr 14 '19 at 12:33
  • Also, what is the list of arithmetic operations that you desire? Is it something simple (eg only + - * /) or does it also include more complex things like 4^3 (i.e. 4*4*4) – Samleo Apr 14 '19 at 12:37
  • Can you please me more clear on the question. Do you want to save the product of 5 and 10**6 as the float? – Swastik Udupa Apr 14 '19 at 13:01

4 Answers4

1

Traditionally, in Python, as in many other languages, exponents are prefixed by the letter e or E. While 5 * 10**6 is not a valid floating point literal, 5e6 most definitely is.

This is something to keep in mind for the future, but it won't solve your issue with the in operator. The problem is that in can only check if something you already know is there. What if your input was 5e-8 km instead?

You should start by coming up with an unambiguously clear definition of how you identify the boundary between number and units in a string. For example, units could be the last contiguous bit of non-digit characters in your string.

You could then split the string using regular expressions. Since the first part can be an arbitrary expression, so you can evaluate it with something as simple as ast.literal_eval. The more complicated your expression can be, the more complicated your parser will have to be as well.

Here's an example to get you started:

from ast import literal_eval
import re

pattern = re.compile(r'(.*[\d\.])\s*(\D+)')

data = '5 * 10**6 mm'
match = pattern.fullmatch(data)
if not match:
    raise ValueError('Invalid Expression')
num, units = match.groups()
num = literal_eval(num)
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • `ast.literal_eval` is very limited and it'll give you Exception such as `ValueError: malformed node or string` when trying to use it on simple arithmetic expressions – BPL Apr 14 '19 at 13:34
  • Thank you very much. Just using e6 instead of 10**6 solved the problem. – Mika R. Apr 14 '19 at 13:41
0

It seems that you are looking for the eval function, as noted in @Rasgel's answer. Documentation here

As some people have pointed out, it poses a big security risk.

To circumvent this, I can think of 2 ways:

1. Combine eval with regex

If you only want to do basic arithmetic operations like addition, subtraction and maybe 2**4 or sth like that, then you can use regex to first remove any non-numerical, non-arithmetic operational characters.

import re

a = str(input("Type in your wavelength: "))

if "mm" in a:
    print("Unit = Millimeter")

# After parsing the units,
# Remove anything other than digits, +, -, *, /, . (floats), ! (factorial?) and ()
# If you require any other symbols, add them in

pruned_a = re.sub(r'[^0-9\*\+\-\/\!\.\(\)]', "", a)

result = eval(pruned_a)

2. Make sure eval doesn't actually evaluate any of your local or global variables in your python code.

result = eval(expression, {'__builtins__': None}, {})

(the above code is from another Stackoverflow answer here: Math Expression Evaluation -- there might be other solutions there that you might be interested in)

Combined

import re

a = str(input("Type in your wavelength: "))

if "mm" in a:
    print("Unit = Millimeter")

# After parsing the units,
# Remove anything other than digits, +, -, *, /, . (floats), ! (factorial?) and ()
# If you require any other symbols, add them in

pruned_a = re.sub(r'[^0-9\*\+\-\/\!\.\(\)]', "", a)

result = eval(pruned_a, {'__builtins__': None}, {}) #to be extra safe :)
Samleo
  • 605
  • 5
  • 16
0

There are many ways to tackle this simple problem, using str.split, regular expressions, eval, ast.literal_eval... Here I propose you to have your own safe routine that will evaluate simple mathematical expressions, code below:

import re
import ast
import operator


def safe_eval(s):
    bin_ops = {
        ast.Add: operator.add,
        ast.Sub: operator.sub,
        ast.Mult: operator.mul,
        ast.Div: operator.itruediv,
        ast.Mod: operator.mod,
        ast.Pow: operator.pow
    }

    node = ast.parse(s, mode='eval')

    def _eval(node):
        if isinstance(node, ast.Expression):
            return _eval(node.body)
        elif isinstance(node, ast.Str):
            return node.s
        elif isinstance(node, ast.Num):
            return node.n
        elif isinstance(node, ast.BinOp):
            return bin_ops[type(node.op)](_eval(node.left), _eval(node.right))
        else:
            raise Exception('Unsupported type {}'.format(node))

    return _eval(node.body)


if __name__ == '__main__':
    text = str(input("Type in your wavelength: "))
    tokens = [v.strip() for v in text.split()]
    if len(tokens) < 2:
        raise Exception("expected input: <wavelength expression> <unit>")

    wavelength = safe_eval("".join(tokens[:-1]))
    dtype = tokens[-1]

    print(f"You've typed {wavelength} in {dtype}")

I'll also recommend you read this post Why is using 'eval' a bad practice?

BPL
  • 9,632
  • 9
  • 59
  • 117
-3

In case you have a string like 5*106and want to convert this number into a float, you can use the eval() function.

>>> float(eval('5*106'))
530.0
Rasgel
  • 152
  • 1
  • 1
  • 13
  • Using `eval` is problematic, if OP's user is just themselves or this is otherwise toy code then this is okay (although even there you might not want to foster bad habits), but it opens up security risks if e.g. the code is meant to run server-side and the user is just some random stranger. – John Coleman Apr 14 '19 at 12:41
  • Sure, the developer needs to be aware of these security risks. He could add a condition verifying it's a genuine conversion and not malicious code (for example, only two letters in the string, corresponding to a unit). – Rasgel Apr 14 '19 at 12:50