All generalizations are false (irony intended). One cannot say that try: except:
is always faster than regex or vice versa. In your case, regex is not overkill and would be much faster than the try: except:
method. However, based on our discussions in the comments section of your question, I went ahead and implemented a C library that efficiently performs this conversion (since I see this question a lot on SO); the library is called fastnumbers. Below are timing tests using your try: except:
method, using regex, and using fastnumbers
.
from __future__ import print_function
import timeit
prep_code = '''\
import random
import string
x = [''.join(random.sample(string.ascii_letters, 7)) for _ in range(10)]
y = [str(random.randint(0, 1000)) for _ in range(10)]
z = [str(random.random()) for _ in range(10)]
'''
try_method = '''\
def converter_try(vals):
resline = []
for item in vals:
try:
resline.append(int(item))
except ValueError:
try:
resline.append(float(item))
except ValueError:
resline.append(item)
'''
re_method = '''\
import re
int_match = re.compile(r'[+-]?\d+$').match
float_match = re.compile(r'[-+]?\d*\.?\d+(?:[eE][-+]?\d+)?$').match
def converter_re(vals):
resline = []
for item in vals:
if int_match(item):
resline.append(int(item))
elif float_match(item):
resline.append(float(item))
else:
resline.append(item)
'''
fn_method = '''\
from fastnumbers import fast_real
def converter_fn(vals):
resline = []
for item in vals:
resline.append(fast_real(item))
'''
print('Try with non-number strings', timeit.timeit('converter_try(x)', prep_code+try_method), 'seconds')
print('Try with integer strings', timeit.timeit('converter_try(y)', prep_code+try_method), 'seconds')
print('Try with float strings', timeit.timeit('converter_try(z)', prep_code+try_method), 'seconds')
print()
print('Regex with non-number strings', timeit.timeit('converter_re(x)', prep_code+re_method), 'seconds')
print('Regex with integer strings', timeit.timeit('converter_re(y)', prep_code+re_method), 'seconds')
print('Regex with float strings', timeit.timeit('converter_re(z)', prep_code+re_method), 'seconds')
print()
print('fastnumbers with non-number strings', timeit.timeit('converter_fn(x)', prep_code+fn_method), 'seconds')
print('fastnumbers with integer strings', timeit.timeit('converter_fn(y)', prep_code+fn_method), 'seconds')
print('fastnumbers with float strings', timeit.timeit('converter_fn(z)', prep_code+fn_method), 'seconds')
print()
The output looks like this on my machine:
Try with non-number strings 55.1374599934 seconds
Try with integer strings 11.8999788761 seconds
Try with float strings 41.8258318901 seconds
Regex with non-number strings 11.5976541042 seconds
Regex with integer strings 18.1302199364 seconds
Regex with float strings 19.1559209824 seconds
fastnumbers with non-number strings 4.02173805237 seconds
fastnumbers with integer strings 4.21903610229 seconds
fastnumbers with float strings 4.96900391579 seconds
A few things are pretty clear
try: except:
is very slow for non-numeric input; regex beats that handily
try: except:
becomes more efficient if exceptions don't need to be raised
fastnumbers
beats the pants off both in all cases
So, if you don't want to use fastnumbers
, you need to assess if you are more likely to encounter invalid strings or valid strings, and base your algorithm choice on that.