5

I'm doing some string parsing and want to return 1 if the character is a letter, 2 if the character is a number, and pass if the character is anything else.

Normally I would just use a for loop and append to it like this

string = 'hello123...'
values = []
for char in string:
    if char.isalpha():
        values.append(1)
    elif char.isdigit():
        values.append(2)

which returns

[1, 1, 1, 1, 1, 2, 2, 2]

as expected. According to https://stackoverflow.com/a/30245465/10029403 using list comprehensions can be much faster. So I tried:

values = [1 if char.isalpha else 2 if char.isdigit for char in string]

However, this is giving me a syntax error as an 'else' is expected.

File "C:/Users/test3.py", line 12
values = [1 if char.isalpha else 2 if char.isdigit for char in string]
                                                     ^
SyntaxError: invalid syntax

I want it to add nothing if the character is anything but alphanumeric. What is wrong with my code?

I will have to execute this function possibly billions of times so if there is a better way to do this, any increase in efficiency is welcome.

smci
  • 32,567
  • 20
  • 113
  • 146
Cyon
  • 414
  • 1
  • 4
  • 13
  • Please add the stack trace to the post. Also, you need to be calling your functions like `char.isalpha()` and `char.isdigit()` – aydow Jul 30 '18 at 03:36
  • Just my opinion, but I think this will hurt the readability of your program. List comprehension are pretty compact and powerful, but can turn a bit cryptic when abused. Also, is alpha is a method, but the second statement is an attribute. Can you provide a reproducible example? – JAponte Jul 30 '18 at 03:38
  • look [here](https://stackoverflow.com/questions/9987483/elif-in-list-comprehension-conditionals) – Nidhin Sajeev Jul 30 '18 at 03:40
  • 1
    Trying to use a list comprehension because it's "much faster" is probably a bad idea. In an empty loop that does nothing but copy one list to another, a list comprehension is about 20% faster (but just calling `list` is even faster). The more actual work there is to do, the smaller the difference gets. And it's rare that you have code in Python where that makes a difference—usually it's already more than fast enough, and when it isn't, usually you need to look at much bigger changes like running in PyPy or writing a Cython extension or using NumPy or something before worrying about 10%. – abarnert Jul 30 '18 at 03:40
  • @NidhinSajeev the point of this question is that the OP has no final `else`. – AChampion Jul 30 '18 at 03:41
  • @abarnert why else would one use a list comprehension? It sacrifices readability for speed I thought. This question is also for the sake of my curiosity and hopefully learning new ways to optimize simple procedures in Python. Thanks for the replies! – Cyon Jul 30 '18 at 03:54
  • 2
    On the contrary, many times a comprehension is more readable, if kept to a reasonable complexity. Many people dislike reading five lines of code if one will suffice, as long as that one is well thought out. – Amadan Jul 30 '18 at 03:57
  • list comprehensions do not necessarily sacrifice readability. Used well I find they make some code easier to read. – AChampion Jul 30 '18 at 03:58
  • Successively calling list.append() as you do is not great style. Better to use a generator, or generator expression, if you want performance while preserving legibility. List-comprehensions with multiple or-clauses are not a great idea. – smci Jul 30 '18 at 04:00
  • @smci: why would it be O(N^2)? `list.append` is O(1), making repeated appending O(N), AFAIK; do you have any resource that contradicts it? – Amadan Jul 30 '18 at 04:03
  • @smci Awesome, I didn't know that. I'm pretty new to python so I'll look that up. *edit it appears https://wiki.python.org/moin/TimeComplexity says it's O(1). – Cyon Jul 30 '18 at 04:04
  • @smci Any individual `list.append` is worst-case `O(N)`, but calling it repeatedly is amortized `O(1)`. This is a trivial consequence of lists growing geometrically, just like similar dynamic arrays in almost every language/library. – abarnert Jul 30 '18 at 04:06
  • @Amadan, abarnert: my mistake, yes in Python it is O(1)*N = O(N). But iterative append is still not good style. Generator expression is cleaner. – smci Jul 30 '18 at 04:07
  • @smci Where did generator expressions come into it? If the OP needs a list, not just an arbitrary iterable, either `append` or a listcomp is much clearer (as well as much efficient than) a genexpr. (Obviously if he can rewrite the code to not need a list, that would be a huge win, but not because `append` is bad style; because building a list that you don't need is bad…) – abarnert Jul 30 '18 at 04:10
  • FYI we don't have any good canonical for *"List comprehension with multiple clauses"*; [this one](https://stackoverflow.com/questions/48021589/python-adding-multiple-or-clauses-in-python-if-statement-using-list-compreh) comes up but is misnamed and also was closed in favor of of a non-list-comprehension answer. – smci Jul 30 '18 at 04:13

3 Answers3

6

If you don't want to consider the particular element at all, you should include that conditional at the end of the comprehension as a guard.

[1 if char.isalpha() else 2 for char in string if char.isdigit() or char.isalpha()]

The if char.isdigit() or char.isalpha() at the end eliminates any elements which fail to satisfy those predicates.

That being said, I recommend factoring at least the translation part (and possibly the conditional as well) to a separate function for the sake of readability. This one-liner, while clever-looking, it not terribly readable.

Silvio Mayolo
  • 62,821
  • 6
  • 74
  • 116
  • Awesome thanks! I overlooked a different way I could've approached this. – Cyon Jul 30 '18 at 03:55
  • 3
    @Cyon If you really want to use this as an optimization, you really need to benchmark with `timeit` against your actual code. Because there's a very good chance the cost of an extra function call and an extra conditional is going to outweigh the benefits of the append optimization built into list comprehensions. – abarnert Jul 30 '18 at 04:08
3

The other answers do a good job explaining how you can change your list comprehension to correctly handle the non-alphanumeric case. I'd like instead to tackle the assumption that a list comprehension is always significantly faster than a conventional loop.

That is often true, but a lot of the time you can modify your loop to make up most or all of the lost ground. In particular, looking up the append method on the list is relatively slow, since it involves looking up something in a dictionary and creating a bound method object. You can change the code to do the lookup once, before the loop, and your code may end up faster than any of the other versions:

values = []
values_append = values.append   # cache this method lookup
for char in string:
    if char.isalpha():
        values_append(1)        # used cached method here
    elif char.isdigit():
        values_append(2)        # and here

Here's some test timings, using a one-million character string:

import random, timeit

big_str = "".join(random.choice(['a', '1', '~']) for _ in range(1000000))

def loop_cyon(string):
    values = []
    for char in string:
        if char.isalpha():
            values.append(1)
        elif char.isdigit():
            values.append(2)
    return values

def comp_silvio_mayolo(string):
    return [1 if char.isalpha() else 2 for char in string if char.isdigit() or char.isalpha()]

def comp_amadan1(string):
    return [1 if char.isalpha() else 2 for char in string if char.isalnum()]

def comp_amadan2(string):
    return list(filter(None, (1 if char.isalpha() else 2 if char.isalnum() else None for char in string)))

def loop_blckknght(string):
    values = []
    values_append = values.append
    for char in string:
        if char.isalpha():
            values_append(1)
        elif char.isdigit():
            values_append(2)
    return values

for func in [loop_cyon, comp_silvio_mayolo, comp_amadan1, comp_amadan2, loop_blckknght]:
    print(func.__name__)
    timeit.timeit(lambda: func(big_str), number=10)

Output on my system (Windows 10 64x, Python 3.6):

loop_cyon 2.5896435911574827
comp_silvio_mayolo 2.6970998627145946
comp_amadan1 2.177768147485949
comp_amadan2 2.676028711925028
loop_blckknght 2.244682003625485

So it looks like the best list comprehension is still a little bit faster than my loop code, but not by much. And I'd certainly say that the explicit loop is clearer in this situation, and that clarity may be more important than the performance differences.

Blckknght
  • 100,903
  • 11
  • 120
  • 169
  • I didn't know you could use append like that. Thanks! – Cyon Jul 30 '18 at 04:16
  • Yep, bound methods, like functions, are first class objects in Python. You can assign them to variables to call later, or do whatever you want with them. It's sometimes useful to pass one to another function (such as using one as a `key` function in `list.sort`). – Blckknght Jul 30 '18 at 05:46
2

Unfortunately, Python doesn't allow you to do that in a comprehension. However, you can do this:

values = [1 if char.isalpha() else 2 for char in string if char.isalnum()]

(where char.isalnum() if and only if char.isalpha() or char.isdigit().)

You can also do

values = [value for value in (1 if char.isalpha() else 2 if char.isalnum() else None for char in string) if value]

or its equivalent (thanks for the reminder, AChampion)

values = list(filter(None, (1 if char.isalpha() else 2 if char.isalnum() else None for char in string)))

to get rid of the None, though I am not sure how fast/not fast it is (it is not building a new list, just using a generator, so the slowdown might be quite imperceptible).

Amadan
  • 191,408
  • 23
  • 240
  • 301
  • Any reason not to use `filter(None, ...)` to remove the `None`s. – AChampion Jul 30 '18 at 03:53
  • @AChampion: `None.` I mean, none. – Amadan Jul 30 '18 at 03:55
  • Thanks! It's nice to know Python actually can't do this rather than some silly typo I overlooked. – Cyon Jul 30 '18 at 03:57
  • 1
    It's not just Python. The issue is that `x if z else y`, just like `z ? x : y` in other languages, is an _expression_, and an expression needs to return a value; combined with the fact that the `e` in `[e for x in y if z]` must be an expression, and the fact that `if z` in comprehension is syntactically completely separate from `e`. – Amadan Jul 30 '18 at 04:01