9

In this question, I'm having an argument with a commenter who argues that

for t in threads:
    t.join()

would be better than

[t.join() for t in threads]

Leaving the matter of "abusing comprehensions" aside - I tend to agree but I would like a one-liner for this: How (in-)efficient is my version (the second one) really?. Does Python materialize list comprehensions always / in my case or does it use a generator internally?

Would map(lambda t: t.join(), threads) be more efficient? Or is there another way to apply the function to each element in the list threads?

Community
  • 1
  • 1
Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • With inefficient you mean how much longer would it take for it to complete it's full operation, or how much resource it would take for each scenario? Because, it's quite easy to use the `.tick()` function or simply use the `time.time()` to differentiate the two operations in terms of speed. That aside, my brain tells me the second solution would take longer to execute than the two-liner. – Torxed Jan 29 '14 at 10:45
  • 7
    "I would like a one-liner for this". So remove the linebreak, `for t in threads: t.join()`. If that's less readable, then *a one-liner is less readable* and you shouldn't want one. So work on curing yourself of the desire for one-liners :-) – Steve Jessop Jan 29 '14 at 10:48
  • @Torxed: I was wondering if I could avoid the memory overhead. – Aaron Digulla Jan 29 '14 at 10:53
  • @SteveJessop: It just feels odd that I can't say `foreach(threads).join()` or something :-) – Aaron Digulla Jan 29 '14 at 10:54
  • 1
    @AaronDigulla you actually could do that, if you made foreach return a proxy object where \_\_getattr\_\_ generated functions that called the named method on the wrapped items. But you probably shouldn't. ;) – babbageclunk Jan 29 '14 at 10:59

5 Answers5

10

A list comprehension will always produce a list object, in this case with the return values of all the t.join() calls. Python thus produces as list with None values of length len(threads) for you. Python will never try to optimize away the list object creation.

Using map() is also not any more efficient as you add additional stack pushes with the lambda. Just stick with the explicit for loop.

Really, for a series of thread joins there is no point in trying to micro optimize here. You are hurting readability for a non-critical piece of code.

In other words, I entirely agree with the commenter. Do not use a list comprehension or map() just for the side effects and saving yourself having to hit ENTER and create two lines of code.

Quoting the Zen of Python:

  • Readability counts.
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • +1 I agree to your points, I just wanted to make sure I'm not overlooking something. It does feel odd that I can't say `all(threads).join()` or something – Aaron Digulla Jan 29 '14 at 10:59
  • 2
    If you want it to read like that, write a function called `join_all_threads(list_of_threads)` which takes a list of threads to join. That way it is at least obvious what the function is meant to do. – three_pineapples Jan 29 '14 at 11:06
  • 1
    @AaronDigulla: Why is that odd? I find such post-fix operations odd instead; less readable to me. – Martijn Pieters Jan 29 '14 at 11:07
  • @MartijnPieters: I have this feeling that I'm overlooking something obvious. But maybe I have a leak in my JavaScript part of the brain ;-) – Aaron Digulla Jan 29 '14 at 11:24
2

You can make a foreach that works as you want. I would recommend against it, as it's not a normal python way of doing things. Untested, but along the lines of this:

class foreach:
    def __init__(self, object_list):
        self.__object_list = object_list

    def __getattr__(self, name):
        def f(*args, **kwargs):
            for obj in self.__object_list:
                getattr(obj, name)(*args, **kwargs)
        return f


foreach(thread_list).join()

It pretty much creates a sort of proxy object that forwards any call with any parameters to all objects.

(I will most likely have a good chat if you if you would do this in my development team, but it does serve as an example of what is possible in python)

Daid Braam
  • 173
  • 4
  • This is one of the cases where I thought three times before I upvoted :-) In a sense, it's a perfectly good example but I (like you) feel this is dangerous. – Aaron Digulla Oct 18 '17 at 15:43
1

If you want a one-liner for this, write an exhaust function, and use it in conjunction with a generator expression:

def exhaust(iterator):
    for _ in iterator:
        pass

exhaust(t.join() for t in threads)

Then you aren't paying the cost for all of the list storage.

Feel free to rename exhaust something more snappy or relevant to your usage.

I've seen this list-comprehension abuse quite a lot, even in interview coding samples where we've explicitly instructed candidates to avoid unbounded memory growth when solving the problem.

babbageclunk
  • 8,523
  • 1
  • 33
  • 37
  • 2
    `exhaust` can also be written `collections.deque(iterable, maxlen=0)`. I'm not sure whether that pushes the loop into C code, but it's allowed to, and it's what `itertools` recipes recommends. – Steve Jessop Jan 29 '14 at 10:51
  • Cute! Although maybe a little obtuse. – babbageclunk Jan 29 '14 at 10:51
  • 1
    But why all the 'cleverness' of a generator? Just use a regular loop and stop trying to confuse anyone reading the code. – Martijn Pieters Jan 29 '14 at 10:52
  • Well, I agree with you - I was just replying to the "I want a one-liner" part. – babbageclunk Jan 29 '14 at 10:53
  • That would get rid of the memory overhead at the cost of runtime overhead. What about `any(t.join() for t in threads)`? – Aaron Digulla Jan 29 '14 at 10:58
  • 1
    Well, any returns early if it hits a true result. So it works for t.join() which always returns None, but it's pretty subtle - it'd be easy for someone to then try that idiom somewhere else and be surprised when it fails. Do you mean the runtime overhead of a function call? I'd expect that to be negligible compared to the t.join() costs. (All of this is pretty academic - you should just do the for loop.) – babbageclunk Jan 29 '14 at 11:11
  • @babbageclunk: If I loop in the method, then I can just use `for t in threads: t.join()`. I'm really looking for a readable and compact solution (in that order). – Aaron Digulla Jan 29 '14 at 11:23
  • @babbageclunk: But along the lines of your comment: The best/only solution seems to be to write a new C function `each()` which works like `all()` and `any()` and which doesn't return anything. – Aaron Digulla Jan 29 '14 at 11:26
  • @AaronDigulla I guess - `each` would be the same as `exhaust`, but transliterated into C. (And even then, calling a C function isn't nonzero cost, although it is much faster.) What I was saying is that you can write the for loop once (in either Python or C) and then have one-liners everywhere you want to use that idiom. – babbageclunk Jan 29 '14 at 11:43
1

One situation where no-assignment list compressions are better than for loops are when assigning self or similar with locals().items():

[self.__setattr__(k, v) for k, v in locals().items() if k != 'self']

List compressions get their own scope, and doing the same with a regular loop like this:

for k, v in locals().items():
    if k != 'self':
        setattr(self, k, v)

would end up assigning both k and v because they were added to locals() after the first loop. Although I prefer to use it in conjunction with __slots__ like this:

[self.__setattr__(k, v) for k, v in locals().items() if k in self.cls.__slots__]

Where self.cls is type(self) or self.__class__, in which case both require equal if statements, but I still prefer the list compression even in that situation.

Break
  • 332
  • 3
  • 19
  • Took me a moment to understand what you're doing there. I wonder why I would want to copy all local variables into the class (instead of, say, collect the key+value in a `dict` and then use that). Also, why `self.cls` and not `self.__class__`? – Aaron Digulla Mar 04 '21 at 08:44
  • 1
    Oh, small mistake probably, but `self.cls` here is `self.__class__`. Maybe the `dict` method you mention could be better, but this is just the way I've always done. I use it to assign the parameters of `__init__` to the class instead of typing each one separately. – Break Mar 12 '21 at 01:36
0

If you want a one-liner why not just do for t in threads: t.join()?

Seems like the simplest solution to me, and likely to be the fastest too (though really, the overhead of joining threads is likely to dwarf anything else IMO)

three_pineapples
  • 11,579
  • 5
  • 38
  • 75
  • While this would work, I'm still looking for a readable solution. Like `foreach(threads).join()` or something. My feeling is that I'm overlooking something obvious. – Aaron Digulla Jan 29 '14 at 10:56
  • Well I can think of a way to create a function `foreach` which would work exactly as you have written it, but I think I'll keep it to myself and repeat the theme: make your code readable to other Python users. – three_pineapples Jan 29 '14 at 11:04