I have some messy data that I'm passing through a function. The function below tries to take an average. Sometimes items in the list aren't numbers, and will throw an error.
I tried to use regex to replace non numeric characters, but some stuff is still getting through. Any time a bad value shows up (due to messy data) I just want a 0 recorded for that item in the list.
def mean(vals):
if len(vals) == 0:
return 0.0
for val in vals:
val = re.sub("[^0-9.]", "", str(val))
print vals
vals = [float(val) for val in vals]
return sum(vals) / len(vals)
I'm printing the list of vals just to see where I'm throwing an error. The last vals list is:
['</a>']
How is this possible, given I've regexed everything that isn't a number or a period?