4

I'm wondering if there is not a way to compute the complement of a list comprehension in Python. Something like:

evens = [i in range(10) if i % 2 == 0]
odds  = [i in range(10) if i % 2 != 0]

is there a way to get both evens and odds in one call? For a very large list, or a more expensive if statement, I think this would save a lot of time.

Mike
  • 6,813
  • 4
  • 29
  • 50
  • I would recommend looking into sets: https://docs.python.org/2/library/sets.html – Nicholas Flees May 13 '14 at 20:50
  • 1
    @NicholasFlees How would sets help? – Waleed Khan May 13 '14 at 20:50
  • I'm not sure what you mean by complement. Do you mean a way to divide a list into two lists based on a condition? – David Robinson May 13 '14 at 20:52
  • @khachik I'd like to separate the true cases from the false ones, I'd like them to be in separate containers in the end. – Mike May 13 '14 at 20:52
  • Sets come equipped with a `difference` method via which the complement of the set is easily accessible. – Nicholas Flees May 13 '14 at 20:54
  • I think the best way to solve this problem is just to not use a list comprehension. It would probably scale better (what if you have a condition with more than two return values) and would be less roundabout and consequently probably more efficient. – Waleed Khan May 13 '14 at 20:55
  • @WaleedKhan, agree, just use one loop to fill both lists if performance is an issue. – khachik May 13 '14 at 20:56
  • 1
    Possible duplicate: [Python: split a list based on a condition](http://stackoverflow.com/q/949098/1199226) – itsjeyd May 13 '14 at 20:56
  • I was going to post http://pastebin.com/Eqsg8AjV, but question is already locked, unfortunately. My answer doesn't really fit linked question, but fit this one. I leave it here in case someone is interested generator/itertools based approach. – m.wasowski May 13 '14 at 21:51

4 Answers4

4

I believe this question has been asked before, but I am not finding the link currently.

If you are trying to get more than one predicate and you only want to iterate once over the original generator, then you will have to use a simple for loop.

evens = []
odds = []

for i in xrange(10):
   if i % 2 == 0: evens.append(i)
   else: odds.append(i)

As @dawg pointed out, the logic inside the loop can be made more concise using clever indexing.

for i in xrange(10):
   (evens,odds)[i%2].append(i)
merlin2011
  • 71,677
  • 44
  • 195
  • 329
1

itertools.groupby is what I'd use.

In [1]: import itertools as it

In [2]: key = lambda i: i%2 == 0

In [3]: l = list(range(10))

In [4]: l.sort(key=key)

In [5]: [list(i[1]) for i in it.groupby(l, key=key)]
Out[5]: [[1, 3, 5, 7, 9], [0, 2, 4, 6, 8]]
utdemir
  • 26,532
  • 10
  • 62
  • 81
  • Note that if you're concerned about the `if` condition being expensive (as the OP is), this will have far more `if` evaluations than the original one would. (Sorting requires asymptotically about `n log (n)` comparisons, and then another `n` evaluations have to be evaluated again for the groupby, compared to a total of `2n` comparisons for the original). – David Robinson May 13 '14 at 21:00
  • @DavidRobinson, AFAIK Python's built-in `sort` function calls `key` argument once for each input; so, it's just `n` conditions. And Python's `timsort` is pretty fast for long ascending/descending runs. – utdemir May 13 '14 at 21:13
0

I would do one of the following:

evens = [i in range(10) if i % 2 == 0]
odds  = [i in range(10) if i not in evens]

Or with better performances:

evens = [i in range(10) if i % 2 == 0]
evens_set = set(evens)
odds = [i in range(10) if i not in evens_set]

Working with set is better in performance, as the not in query costs O(1) instead of O(n) in lists

SomethingSomething
  • 11,491
  • 17
  • 68
  • 126
  • Sets would be nice here, but I'm often working with unhashable types, like a list of dicts. – Mike May 13 '14 at 20:55
  • If the list items are not hashable, you'll need to pay the O(n) complexity for each query on the other list, or make some workaround with hashing – SomethingSomething May 13 '14 at 20:59
0

In short, you can get both True and False cases in one call, but you'd still need to split them into two lists. You could do

range_10 = range(10)
odds = range_10[1::2]
evens = range_10[::2]

but the benefit of that would be negligible. (In fact, you'd be creating three lists instead of two). You'd only want to do that if the cost of range(10) was so high that it would offset creating two lists.

Using slicing like I did should be slightly faster than using a test and explicitly appending.

Hans Then
  • 10,935
  • 3
  • 32
  • 51