The other answers do a good job explaining how you can change your list comprehension to correctly handle the non-alphanumeric case. I'd like instead to tackle the assumption that a list comprehension is always significantly faster than a conventional loop.
That is often true, but a lot of the time you can modify your loop to make up most or all of the lost ground. In particular, looking up the append
method on the list is relatively slow, since it involves looking up something in a dictionary and creating a bound method object. You can change the code to do the lookup once, before the loop, and your code may end up faster than any of the other versions:
values = []
values_append = values.append # cache this method lookup
for char in string:
if char.isalpha():
values_append(1) # used cached method here
elif char.isdigit():
values_append(2) # and here
Here's some test timings, using a one-million character string:
import random, timeit
big_str = "".join(random.choice(['a', '1', '~']) for _ in range(1000000))
def loop_cyon(string):
values = []
for char in string:
if char.isalpha():
values.append(1)
elif char.isdigit():
values.append(2)
return values
def comp_silvio_mayolo(string):
return [1 if char.isalpha() else 2 for char in string if char.isdigit() or char.isalpha()]
def comp_amadan1(string):
return [1 if char.isalpha() else 2 for char in string if char.isalnum()]
def comp_amadan2(string):
return list(filter(None, (1 if char.isalpha() else 2 if char.isalnum() else None for char in string)))
def loop_blckknght(string):
values = []
values_append = values.append
for char in string:
if char.isalpha():
values_append(1)
elif char.isdigit():
values_append(2)
return values
for func in [loop_cyon, comp_silvio_mayolo, comp_amadan1, comp_amadan2, loop_blckknght]:
print(func.__name__)
timeit.timeit(lambda: func(big_str), number=10)
Output on my system (Windows 10 64x, Python 3.6):
loop_cyon 2.5896435911574827
comp_silvio_mayolo 2.6970998627145946
comp_amadan1 2.177768147485949
comp_amadan2 2.676028711925028
loop_blckknght 2.244682003625485
So it looks like the best list comprehension is still a little bit faster than my loop code, but not by much. And I'd certainly say that the explicit loop is clearer in this situation, and that clarity may be more important than the performance differences.