The easiest way to do this is by sanitizing your input. Basically, you want to ONLY pay attention to parameters you define and discard everything else. Sanitation for a numerical equation follows a few simple steps:
- Extract static, known equation parts (variable names, operators)
- Extract numerical values (which should be allowed if the user can define their own function).
- Reconstruct the function using these extracted parts. This discards everything that you do not handle and could be potentially problematic when using Python's
ast
or eval
.
Here's a pretty robust sanitizer I adapted from another project. The code is below, but here are some sample inputs and outputs:
In an ideal case, input and output are identical:
enter func: building_size*40+land_size*20-(building_size+land_size)
building_size*40+land_size*20-(building_size+land_size)
However, were the user to use spaces/periods/tabs/even newlines (gasp), the output is still beautiful:
enter func:
building_size * 500 + land_size-20+building_size.
building_size*500+land_size-20+building_size
And no matter what kind of misguided, malicious injection your user tries, the input is perfectly clean:
enter func: land_size + 2 * building_size quit()
land_size+2*building_size
enter func: 1337+land_size h4x'; DROP TABLE members;
1337+land_size
What's more, you can very easily modify the function to feed the actual values into the equation once sanitized. What I mean by this is go from land_size+2*building_size
to 100+2*200
with a simple replace
statement. This will allow your functions to be parseable by eval
and ast
.
The code is below:
import re
# find all indices of a given char
def find_spans(ch, s):
return [tuple((i, i+1)) for i, ltr in enumerate(s) if ltr == ch]
# check to see if an unknown is a number
def is_number(s):
try:
float(s)
except:
return False
return True
# these are the params you will allow
# change these to add/remove parameters/operators
allowed_params = ['land_size', 'building_size']
operators = ['+', '-', '*', '/', '(', ')']
# get input
in_formula = raw_input('enter func: ')
# dictionary that will hold every allowed function element found in the input and its position(s)
found_params = {}
# extract param indices
for param in allowed_params:
found_params[param] = [i.span() for i in re.finditer(param, in_formula)]
# extract operator indices
for op in operators:
found_params[op] = find_spans(op,in_formula)
# get all index regions that are "approved", that is, they are either a param or operator
allowed_indices = sorted([j for i in found_params.values() for j in i])
# these help remove anything unapproved at beginning or end
allowed_indices.insert(0,(0,0))
allowed_indices.append((len(in_formula),len(in_formula)))
# find all index ranges that have not been approved
unknown_indices = [(allowed_indices[i-1][1], allowed_indices[i][0]) for i in range(1,len(allowed_indices)) if allowed_indices[i][0] <> allowed_indices[i-1][1]]
# of all the unknowns, check to see if any are numbers
numbers_indices = [(''.join(in_formula[i[0]:i[1]].split()),i) for i in unknown_indices if is_number(in_formula[i[0]:i[1]])]
# add these to our final dictionary
for num in numbers_indices:
try:
found_params[num[0]].append(num[1])
except:
found_params[num[0]] = [num[1]]
# get final order of extracted parameters
final_order = sorted([(i[0],key) for key in found_params.keys() for i in found_params[key]])
# put all function elements back into a string
final_function = ''.join([i[1] for i in final_order])
#
# here you could replace the parameters in the final function with their actual values
# and then evaluate using eval()
#
print final_function
Let me know if something doesn't make sense and I'd be glad to explain it.