17

Today I was parsing a directory index with a list of paths to zip files using BeautifulSoup and came across an interesting thing. Let's assume I would like to take all the href properties of the tags I got and put them directly into a Queue:

q = Queue.Queue()
[q.put(tag['href']) for tag in soup.findAll('a')]

I've never run into a situation like this before where a comprehension could be used inline without assigning it to anything, just to generated another iterator through some routine call. Is this considered bad practice? Is it "pythonic", per se? Was there a better one-liner to put all the items into the queue?

DeaconDesperado
  • 9,977
  • 9
  • 47
  • 77
  • 9
    `one_liner != pythonic` :) – Andy Hayden Jan 10 '13 at 01:26
  • List comprehension always return a list. Therefore, it's a bad practice because you don't need the returned list in this case! If `q` supports adding the entire list into the queue object, that's fine, otherwise, bad. – CppLearner Jan 10 '13 at 01:28

5 Answers5

16

This has been asked many times, e.g., here and here. But it's an interesting question, though. List comprehensions are meant to be used for something else.

Other options include

  1. use map() - basically the same as your sample
  2. use filter() - if your function returns None, you will get an empty list
  3. Just a plain for-loop

while the plain loop is the preferable way to do it. It is semantically correct in this case, all other ways, including list comprehension, abuse concepts for their side-effect.

In Python 3.x, map() and filter() are generators and thus do nothing until you iterate over them. So we'd need, e.g., a list(map(...)), which makes it even worse.

Community
  • 1
  • 1
Thorsten Kranz
  • 12,492
  • 2
  • 39
  • 56
  • Sorry, tried to search "without assignment" to prevent dupes. – DeaconDesperado Jan 10 '13 at 01:30
  • 1
    No need to excuse - somtimes it's hard to find the right words for a search string – Thorsten Kranz Jan 10 '13 at 01:31
  • 1
    `filter()` and `map()` are worse choices. In Python 2.x they are eager--they return a list so you are guaranteed to get the side effects when they are called. In Python 3.x they are lazy--they return an iterator which you must iterate through and exhaust to get the side effects of the evaluation. Using `map()` or `filter()` for side-effects in Python 2.x will leave you with code that breaks in a weird an unexpected way in Python 3.x. – Peter Graham Jan 10 '13 at 02:01
  • Downvote: Map and filter does seem like poor choices in this case. – Nuclearman Jan 10 '13 at 03:54
9

There are many opinions on this thread, I can only speak from coding conventions at my organization.

there are many ways to affect a loop, but a key attribute of list comprehensions is that they create lists, with one item for each in the iterated over sequence.

>>> import Queue
>>> q = Queue.Queue()
>>> [q.put(item) for item in range(5)]
[None, None, None, None, None]
>>>

this unused list is obviously wasteful. As such, this construction, a list comprehension with unused return value; is forbidden from appearing in our code base. An explicit loop like above, or a generated combined with something that consumes it, for instance:

>>> any(q.put(item) for item in xrange(5))
False
>>>

or just:

>>> for item in xrange(5):
...     q.put(item)
...
>>>

is required to pass review.

SingleNegationElimination
  • 151,563
  • 33
  • 264
  • 304
  • 4
    The `any()` trick is interesting at first view, but it should be noted that `any()` stops traversing a structure as soon as it finds an item that evaluates to `True`. If for instance the expression `q.put(item)` returned a truthy value at some point, the processing would stop. By the time you come up with a foolproof construct using `any()` you realize you're better off simply just using an ordinary `for ... in`. – Michael Ekoka Apr 07 '20 at 13:41
  • What about `all` then for this purpose? – Break Feb 08 '21 at 02:27
  • Same problem, `all` stops on the first `False`. – Marcelotsvaz Jul 21 '23 at 19:06
6

If you think of it as a loop over the list returned by soup.findAll it would look like this:

for tag in soup.findAll('a'):
    q.put(tag['href'])

This is probably the more 'pythonic' form as 'explicit is better than implict'

timc
  • 2,124
  • 15
  • 13
2

Probably not a better one-liner, but I'd (personally) consider doing this instead of:

for tag in soup.findAll('a'):
    q.put(tag['href'])

to be bad practice. Firstly, the one liner will generate a list full of [None], which is likely more inefficient. Secondly, it's not immediately clear what the code is doing; seeing a list comprehension generally implies the resultant list will be used in some way.

Yuushi
  • 25,132
  • 7
  • 63
  • 81
  • Agreed, I'm probably not going to use it but the syntax made me curious. Without any reference, wouldn't the list of `None`s be garbage collected immediately? – DeaconDesperado Jan 10 '13 at 01:28
  • 1
    @DeaconDesperado Well, immediately being whenever the garbage collector is next run. Depending on the size of the list, however, the time to allocate the list and then garbage collect it could be significant. – Yuushi Jan 10 '13 at 01:34
0

You can use any() the way it's recommended here, in the latest versions of Python

q = Queue.Queue()
any(q.put(tag['href']) for tag in soup.findAll('a'))