Best way to enable user to input formula without making a security hole?

Question

I would like to enable the user to input a formula for calculation given some parameters. What is the best way to do this without making a security hole?

Kind of like this:

def generate_bills():
    land_size = 100
    building_size = 200    
    class = 1
    formula = "(0.8*land_size)+building_size+(if class==1 10 else if class==2 5 else 2)"
    bill = calculate(formula,{'land_size':land_size,'building_size':building_size})

What exactly is input by the user? Formula and parameters? Can/must they use all parameters? — Luigi, May 07 '14 at 04:08
They don't have to use all the parameters. They only input the formula, values of the parameters are provided by the system. — William Wino, May 07 '14 at 04:11
One more clarification--is `calculate` your own function or is it from an external module? Either way I think its code is relevant to my answer to your question. — Luigi, May 07 '14 at 04:16
If your formulas are valid python, you might be able to get away with using [`ast`](https://docs.python.org/2/library/ast.html). See: http://stackoverflow.com/a/20748308/65295 — Seth, May 07 '14 at 04:23
possible duplicate of [Python: make eval safe](http://stackoverflow.com/questions/3513292/python-make-eval-safe) — John Zwinck, May 07 '14 at 05:52

score 0 · Answer 1 · edited May 23 '17 at 11:50

The easiest way to do this is by sanitizing your input. Basically, you want to ONLY pay attention to parameters you define and discard everything else. Sanitation for a numerical equation follows a few simple steps:

Extract static, known equation parts (variable names, operators)
Extract numerical values (which should be allowed if the user can define their own function).
Reconstruct the function using these extracted parts. This discards everything that you do not handle and could be potentially problematic when using Python's ast or eval.

Here's a pretty robust sanitizer I adapted from another project. The code is below, but here are some sample inputs and outputs:

In an ideal case, input and output are identical:

enter func: building_size*40+land_size*20-(building_size+land_size)
building_size*40+land_size*20-(building_size+land_size)

However, were the user to use spaces/periods/tabs/even newlines (gasp), the output is still beautiful:

enter func: 
    building_size *    500 + land_size-20+building_size.
building_size*500+land_size-20+building_size

And no matter what kind of misguided, malicious injection your user tries, the input is perfectly clean:

enter func: land_size + 2 * building_size quit()
land_size+2*building_size

enter func: 1337+land_size h4x'; DROP TABLE members;
1337+land_size

What's more, you can very easily modify the function to feed the actual values into the equation once sanitized. What I mean by this is go from land_size+2*building_size to 100+2*200 with a simple replace statement. This will allow your functions to be parseable by eval and ast.

The code is below:

import re

# find all indices of a given char
def find_spans(ch, s):
    return [tuple((i, i+1)) for i, ltr in enumerate(s) if ltr == ch]

# check to see if an unknown is a number
def is_number(s):
    try:
        float(s)
    except:
        return False
    return True

# these are the params you will allow
# change these to add/remove parameters/operators
allowed_params = ['land_size', 'building_size']
operators = ['+', '-', '*', '/', '(', ')']

# get input
in_formula = raw_input('enter func: ')

# dictionary that will hold every allowed function element found in the input and its position(s)
found_params = {}

# extract param indices
for param in allowed_params:
    found_params[param] = [i.span() for i in re.finditer(param, in_formula)]

# extract operator indices
for op in operators:
    found_params[op] = find_spans(op,in_formula)

# get all index regions that are "approved", that is, they are either a param or operator
allowed_indices = sorted([j for i in found_params.values() for j in i])

# these help remove anything unapproved at beginning or end
allowed_indices.insert(0,(0,0))
allowed_indices.append((len(in_formula),len(in_formula)))

# find all index ranges that have not been approved
unknown_indices = [(allowed_indices[i-1][1], allowed_indices[i][0]) for i in range(1,len(allowed_indices)) if allowed_indices[i][0] <> allowed_indices[i-1][1]]

# of all the unknowns, check to see if any are numbers
numbers_indices = [(''.join(in_formula[i[0]:i[1]].split()),i) for i in unknown_indices if is_number(in_formula[i[0]:i[1]])]

# add these to our final dictionary
for num in numbers_indices:
    try:
        found_params[num[0]].append(num[1])
    except:
        found_params[num[0]] = [num[1]]

# get final order of extracted parameters
final_order = sorted([(i[0],key) for key in found_params.keys() for i in found_params[key]])

# put all function elements back into a string
final_function = ''.join([i[1] for i in final_order])

#
# here you could replace the parameters in the final function with their actual values
# and then evaluate using eval()
#

print final_function

Let me know if something doesn't make sense and I'd be glad to explain it.

Best way to enable user to input formula without making a security hole?

1 Answers1