66

I have a list comprehension which approximates to:

[f(x) for x in l if f(x)]

Where l is a list and f(x) is an expensive function which returns a list.

I want to avoid evaluating f(x) twice for every non-empty occurance of f(x). Is there some way to save its output within the list comprehension?

I could remove the final condition, generate the whole list and then prune it, but that seems wasteful.

Edit:

Two basic approaches have been suggested:

An inner generator comprehension:

[y for y in (f(x) for x in l) if y]

or memoization.

I think the inner generator comprehension is elegant for the problem as stated. In actual fact I simplified the question to make it clear, I really want:

[g(x, f(x)) for x in l if f(x)]

For this more complicated situation, I think memoization produces a cleaner end result.

Stefan
  • 8,819
  • 10
  • 42
  • 68
  • 5
    well you can indeed solve it with a generator comprehension even in this case, simply by `[g(x, fx) for x, fx in ((x,f(x)) for x in l) if fx]`. the main point is if there is any duplication in x. – EnricoGiampieri Apr 04 '13 at 14:00
  • Thanks, it seems everything can be solved with comprehensions! I still think once the expression has gotten that complicated that memoization makes for more readable code. – Stefan Apr 04 '13 at 14:03
  • 2
    Yes, use a generator (with parentheses, not square brackets). If you prefer memoization that's fine, but a generator is _much_ better than building and then filtering the entire list, as you have now. (For example, it could be used if the inner generator is infinite, and the outer comprehension stops when it finds a certain value). – alexis Apr 04 '13 at 14:07

12 Answers12

51
[y for y in (f(x) for x in l) if y]

Will do.

cs95
  • 379,657
  • 97
  • 704
  • 746
RobertT
  • 4,300
  • 3
  • 29
  • 36
13

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), it's possible to use a local variable within a list comprehension in order to avoid calling twice the same function:

In our case, we can name the evaluation of f(x) as a variable y while using the result of the expression to filter the list but also as the mapped value:

[y for x in l if (y := f(x))]
Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
11

A solution (the best if you have repeated value of x) would be to memoize the function f, i.e. to create a wrapper function that saves the argument by which the function is called and save it, than return it if the same value is asked.

a really simple implementation is the following:

storage = {}
def memoized(value):
    if value not in storage:
        storage[value] = f(value)
    return storage[value]

[memoized(x) for x in l if memoized(x)]

and then use this function in the list comprehension. This approach is valid under two condition, one theoretical and one practical. The first one is that the function f should be deterministic, i.e. returns the same results given the same input, and the other is that the object x can be used as a dictionary keys. If the first one is not valid than you should recompute f each timeby definition, while if the second one fails it is possible to use some slightly more robust approaches.

You can find a lot of implementation of memoization around the net, and I think that the new versions of python have something included in them too.

On a side note, never use the small L as a variable name, is a bad habit as it can be confused with an i or a 1 on some terminals.

EDIT:

as commented, a possible solution using generators comprehension (to avoid creating useless duplicate temporaries) would be this expression:

[g(x, fx) for x, fx in ((x,f(x)) for x in l) if fx]

You need to weight your choice given the computational cost of f, the number of duplication in the original list and memory at you disposition. Memoization make a space-speed tradeoff, meaning that it keep tracks of each result saving it, so if you have huge lists it can became costly on the memory occupation front.

EnricoGiampieri
  • 5,947
  • 1
  • 27
  • 26
  • 2
    Pretty much my answer, only you could just use a @memoize decorator. – Inbar Rose Apr 04 '13 at 13:41
  • this is a rough implementation of the memoization decorator for explaination purpose indeed. about performances, it is worst of the other solution only if each x is different from each others. if even one is the same as the other it will just suffer from the overhead of the dictionary that is marginal given that f has been defined as a costly function. this mean that even with one repetition it will gain a lot in terms of function call reductions. it's not essential as other solutions but it's more robust... – EnricoGiampieri Apr 04 '13 at 13:43
  • I edited my comment for a better explanation. I don't personally like memoization, but in this kind of problems I guess it is the more robust solution, and I didn't use it like a silver bullet without thinking. – EnricoGiampieri Apr 04 '13 at 13:48
  • 5
    Is this really useful though? Your function globally changes `storage` which means that you need to reset `storage` before you call this on a new set of data -- All so you can use a list-comprehension easily. IMHO, this isn't worth it. Just use a loop. – mgilson Apr 04 '13 at 14:05
  • as before, if f is deterministic it would be even better, as any value in the second iteration that has been seen before won't need a second evaluation. In fact, the most often you need to perform the cycle, the more the memoization is useful. Each tecnique has it's good and bad side. – EnricoGiampieri Apr 04 '13 at 14:17
10

You should use a memoize decorator. Here is an interesting link.


Using memoization from the link and your 'code':

def memoize(f):
    """ Memoization decorator for functions taking one or more arguments. """
    class memodict(dict):
        def __init__(self, f):
            self.f = f
        def __call__(self, *args):
            return self[args]
        def __missing__(self, key):
            ret = self[key] = self.f(*key)
            return ret
    return memodict(f)

@memoize
def f(x):
    # your code

[f(x) for x in l if f(x)]
Inbar Rose
  • 41,843
  • 24
  • 85
  • 131
  • 1
    This is my favorite of the memoization options provided here. You still have access to the function via `f.f(...)`, it keeps state on a per-function basis (as opposed to on a global basis), it uses `dict.__missing__` which is so useful. You have my upvote. – mgilson Apr 04 '13 at 14:23
9
[y for y in [f(x) for x in l] if y]

For your updated problem, this might be useful:

[g(x,y) for x in l for y in [f(x)] if y]
Vaughn Cato
  • 63,448
  • 5
  • 82
  • 132
  • Fair enough ... You win, there *is* a way to do this. I would still use a loop though :-P – mgilson Apr 04 '13 at 13:35
  • You can even make the inner comprehension a generator to save on duplicate list creation. – 9000 Apr 04 '13 at 13:35
  • 1
    I *might* use this if I factored the inner generator into a temporary variable ahead of time... – mgilson Apr 04 '13 at 13:37
  • 1
    This **looks** like @Mahdi's solution, but it will build the whole list before it filters it. Mahdi's is better: It will create a generator and pump it one value at a time. – alexis Apr 04 '13 at 14:03
8

Nope. There's no (clean) way to do this. There's nothing wrong with a good-old-fashioned loop:

output = []
for x in l:
    result = f(x)
    if result: 
        output.append(result)

If you find that hard to read, you can always wrap it in a function.

mgilson
  • 300,191
  • 65
  • 633
  • 696
8

As the previous answers have shown, you can use a double comprehension or use memoization. For reasonably-sized problems it's a matter of taste (and I agree that memoization looks cleaner, since it hides the optimization). But if you're examining a very large list, there's a huge difference: Memoization will store every single value you've calculated, and can quickly blow out your memory. A double comprehension with a generator (round parens, not square brackets) only stores what you want to keep.

To come to your actual problem:

[g(x, f(x)) for x in series if f(x)]

To calculate the final value you need both x and f(x). No problem, pass them both like this:

[g(x, y) for (x, y) in ( (x, f(x)) for x in series ) if y ]

Again: this should be using a generator (round parens), not a list comprehension (square brackets). Otherwise you will build the whole list before you start filtering the results. This is the list comprehension version:

[g(x, y) for (x, y) in [ (x, f(x)) for x in series ] if y ] # DO NOT USE THIS
alexis
  • 48,685
  • 16
  • 101
  • 161
  • beside readability, whould be `[g(x, y) for x in series for y in [f(x)] if y] ` different in memory occupation?when the memorized list will be cleaned up (depends on where memorized function is defined)? – user110954 Sep 14 '17 at 17:41
  • The *eventual* list computed is the same, but you'll take longer to get there (and garbage collecting takes time too). – alexis Sep 15 '17 at 14:08
  • 1
    PS. But don't take my word for it, try it out and time it! – alexis Sep 17 '17 at 11:42
4

There have been a lot of answers regarding memoizing. The Python 3 standard library now has a lru_cache, which is a Last Recently Used Cache. So you can:

from functools import lru_cache

@lru_cache()
def f(x):
    # function body here

This way your function will only be called once. You can also specify the size of the lru_cache, by default this is 128. The problem with the memoize decorators shown above is that the size of the lists can grow well out of hand.

Games Brainiac
  • 80,178
  • 33
  • 141
  • 199
  • This looks useful. If I understand correctly how the list comprehension works, I could use an `lru_cache` size of 1, since I only ever want to re-use the most recently used value? – Stefan Sep 10 '14 at 11:26
3

You can use memoization. It is a technique which is used in order to avoid doing the same computation twice by saving somewhere the result for each calculated value. I saw that there is already an answer that uses memoization, but I would like to propose a generic implementation, using python decorators:

def memoize(func):
    def wrapper(*args):
        if args in wrapper.d:
            return wrapper.d[args]
        ret_val = func(*args)
        wrapper.d[args] = ret_val
        return ret_val
    wrapper.d = {}
    return wrapper

@memoize
def f(x):
...

Now f is a memoized version of itself. With this implementation you can memoize any function using the @memoize decorator.

ehudt
  • 810
  • 7
  • 13
3

Use map() !!

comp = [x for x in map(f, l) if x]

f is the function f(X), l is the list

map() will return the result of f(x) for each x in the list.

Guillaume Lebreton
  • 2,586
  • 16
  • 25
1

Here is my solution:

filter(None, [f(x) for x in l])
Eastsun
  • 18,526
  • 6
  • 57
  • 81
  • 2
    As with the others, you can use a generator inside `filter` -- And I usually prefer `filter(bool,...)` to `filter(None,...)`. The do the same thing, but I find the former to be more explicit. e.g. I think this would be better as `filter(bool,(f(x) for x in l))` – mgilson Apr 04 '13 at 14:20
-2

How about defining:

def truths(L):
    """Return the elements of L that test true"""
    return [x for x in L if x]

So that, for example

> [wife.children for wife in henry8.wives]
[[Mary1], [Elizabeth1], [Edward6], [], [], []]

> truths(wife.children for wife in henry8.wives) 
[[Mary1], [Elizabeth1], [Edward6]]
Colonel Panic
  • 132,665
  • 89
  • 401
  • 465