47

I have a list and want to build (via a comprehension) another list. I would like this new list to be limited in size, via a condition

The following code will fail:

a = [1, 2, 1, 2, 1, 2]
b = [i for i in a if i == 1 and len(b) < 3]

with

Traceback (most recent call last):
  File "compr.py", line 2, in <module>
    b = [i for i in a if i == 1 and len(b) < 3]
  File "compr.py", line 2, in <listcomp>
    b = [i for i in a if i == 1 and len(b) < 3]
NameError: name 'b' is not defined

because b is not defined yet at the time the comprehension is built.

Is there a way to limit the size of the new list at build time?

Note: I could break the comprehension into a for loop with the proper break when a counter is reached but I would like to know if there is a mechanism which uses a comprehension.

WoJ
  • 27,165
  • 48
  • 180
  • 345

7 Answers7

77

You can use a generator expression to do the filtering, then use islice() to limit the number of iterations:

from itertools import islice

filtered = (i for i in a if i == 1)
b = list(islice(filtered, 3))

This ensures you don't do more work than you have to to produce those 3 elements.

Note that there is no point anymore in using a list comprehension here; a list comprehension can't be broken out of, you are locked into iterating to the end.

WoJ
  • 27,165
  • 48
  • 180
  • 345
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • `[1/i for i in range(-5, 5)]` does break out and doesn't iterate to the end. – Stefan Pochmann Feb 22 '17 at 14:40
  • 13
    @StefanPochmann: it raises an exception, that's *not the same thing* as a `break` statement. In the end, you have no list result at all. – Martijn Pieters Feb 22 '17 at 14:41
  • Wasn't clear to me that you meant the `break` statement, that word can be understood in a more general way. [For example](http://stackoverflow.com/a/38675546/1672429) not long ago you said *"[`return`] breaks out of the loop"*. In any case, the iteration doesn't go to the end. Also, not having a list result doesn't even have to be a problem. Consider `reciprocals = [1/x for x in a]`, I think that's reasonable code and if `a` contains a zero then one might want a `ZeroDivisionError` and not want a list. – Stefan Pochmann Feb 22 '17 at 15:08
  • 7
    This is a question about how to limit the size of the list produced by a list comprehension, though. That implies you *still want a list result*. – Martijn Pieters Feb 22 '17 at 15:10
6

@Martijn Pieters is completly right that itertools.islice is the best way to solve this. However if you don't mind an additional (external) library you can use iteration_utilities which wraps a lot of these itertools and their applications (and some additional ones). It could make this a bit easier, at least if you like functional programming:

>>> from iteration_utilities import Iterable

>>> Iterable([1, 2, 1, 2, 1, 2]).filter((1).__eq__)[:2].as_list()
[1, 1]

>>> (Iterable([1, 2, 1, 2, 1, 2])
...          .filter((1).__eq__)   # like "if item == 1"
...          [:2]                  # like "islice(iterable, 2)"
...          .as_list())           # like "list(iterable)"
[1, 1]

The iteration_utilities.Iterable class uses generators internally so it will only process as many items as neccessary until you call any of the as_* (or get_*) -methods.


Disclaimer: I'm the author of the iteration_utilities library.

Community
  • 1
  • 1
MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • 1
    This is a very nice library, thanks (still reading the docs to get a grasp on the multitude of functions) – WoJ Feb 22 '17 at 20:23
  • 1
    Might I recommend changing the first link to the project's home page: http://iteration-utilities.readthedocs.io/en/latest/? – jpmc26 Feb 23 '17 at 01:25
  • 1
    Note that using `(1).__eq__` means you'll get unpleasant results like `1.5` comparing equal to `1`, or `'potato'` comparing equal to `1`, because `NotImplemented` is considered true in a boolean context. (They added a DeprecationWarning for this a few years back, but DeprecationWarning is suppressed by default outside of `__main__`.) – user2357112 Aug 08 '23 at 06:55
4

You could use itertools.count to generate a counter and itertools.takewhile to stop the iterating over a generator when the counter reaches the desired integer (3 in this case):

from itertools import count, takewhile
c = count()
b = list(takewhile(lambda x: next(c) < 3, (i for i in a if i == 1)))

Or a similar idea building a construct to raise StopIteration to terminate the generator. That is the closest you'll get to your original idea of breaking the list comprehension, but I would not recommend it as best practice:

c = count()
b = list(i if next(c) < 3 else next(iter([])) for i in a if i == 1)

Examples:

>>> a = [1,2,1,4,1,1,1,1]

>>> c = count()
>>> list(takewhile(lambda x: next(c) < 3, (i for i in a if i == 1)))
[1, 1, 1]

>>> c = count()
>>> list(i if next(c) < 3 else next(iter([])) for i in a if i == 1)
[1, 1, 1]
Community
  • 1
  • 1
Chris_Rands
  • 38,994
  • 14
  • 83
  • 119
  • What advantage over the other answers does this have? – jpmc26 Feb 23 '17 at 01:32
  • @jpmc26 I don't think it's better than Martijn's solution for this exact purpose, but it's more generalisable because the conditions for terminating the generator could be anything, not just a counter. Also the OP asked specifically about a list comprehension and this is the closest valid syntax to that – Chris_Rands Feb 23 '17 at 08:04
  • 1
    Fair enough. Thanks. Since you posted this well after the other answers, you might want to work something about the flexibility advantage into your answer. – jpmc26 Feb 23 '17 at 08:11
3

Same solution just without islice:

filtered = (i for i in a if i == 1)
b = [filtered.next() for j in range(3)]

BTW, pay attention if the generator is empty or if it has less than 3 - you'll get StopIteration Exception.

To prevent that, you may want to use next() with default value. For example:

b = [next(filtered, None) for j in range(3)]

And if you don't want 'None' in your list:

b = [i for i in b if i is not None]
madaniel
  • 161
  • 1
  • 6
0

itertools.slice is the natural way to extract n items from a generator.

But you can also implement this yourself using a helper function. Just like the itertools.slice pseudo-code, we catch StopIteration to limit the number of items yielded.

This is more adaptable because it allows you to specify logic if n is greater than the number of items in your generator.

def take_n(gen, n):
    for _ in range(n):
        try:
            yield next(gen)
        except StopIteration:
            break

g = (i**2 for i in range(5))
res = list(take_n(g, 20))

print(res)

[0, 1, 4, 9, 16]
jpp
  • 159,742
  • 34
  • 281
  • 339
-1
a = [1, 2, 1, 2, 1, 2]

b = [i for i in a if i == 1][:2]

I think this creates a full list comprehension (evaluating each element in the original list) and then slices it. It probably won't have a great performance in a long list, but is easy to read, and very fast to write.

toyota Supra
  • 3,181
  • 4
  • 15
  • 19
jorgito
  • 1
  • 1
-4

use enumerate:

b = [n for i,n in enumerate(a) if n==1 and i<3]
  • 6
    That's simply wrong. First, this will discard everything except the first 3 items of `a` (the question wanted to limit `b` not `a`) and it will process the whole iterable. It won't stop after finding the third item. It just discards everything thereafter (however it will stick check the `n==1 and i < 3`). – MSeifert Feb 27 '17 at 13:16