132

I have several lists having all the same number of entries (each specifying an object property):

property_a = [545., 656., 5.4, 33.]
property_b = [ 1.2,  1.3, 2.3, 0.3]
...

and list with flags of the same length

good_objects = [True, False, False, True]

(which could easily be substituted with an equivalent index list:

good_indices = [0, 3]

What is the easiest way to generate new lists property_asel, property_bsel, ... which contain only the values indicated either by the True entries or the indices?

property_asel = [545., 33.]
property_bsel = [ 1.2, 0.3]
SilentGhost
  • 307,395
  • 66
  • 306
  • 293
fuenfundachtzig
  • 7,952
  • 13
  • 62
  • 87

5 Answers5

177

You could just use list comprehension:

property_asel = [val for is_good, val in zip(good_objects, property_a) if is_good]

or

property_asel = [property_a[i] for i in good_indices]

The latter one is faster because there are fewer good_indices than the length of property_a, assuming good_indices are precomputed instead of generated on-the-fly.


Edit: The first option is equivalent to itertools.compress available since Python 2.7/3.1. See @Gary Kerr's answer.

property_asel = list(itertools.compress(property_a, good_objects))
Devin
  • 55
  • 5
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
  • 1
    @fuen: Yes. Causes a lot on Python 2 (use [itertools.izip](http://docs.python.org/library/itertools.html#itertools.izip) instead), not so much on Python 3. This is because the `zip` in Python 2 will create a new list, but on Python 3 it will just return a (lazy) generator. – kennytm Jul 05 '10 at 11:37
  • OK, so I should stick to your 2nd proposal then, because this makes up the central part of my code. – fuenfundachtzig Jul 05 '10 at 11:39
  • 4
    @85: why are you worrying about performance? Write what you have to do, if it is slow, then test to find bottlenecks. – Gary Kerr Jul 05 '10 at 11:39
  • 1
    @PreludeAndFugue: If there are two equivalent options it's good to know which one is faster, and use that one right away. – fuenfundachtzig Jul 05 '10 at 11:42
  • I suspect the second is *slower*, because where did that good_indices list come from in the first place? Probably by enumerating over all of good_objects and saving the indices where good_objects[i] is True. So no savings after all, plus you had to build a second list. Use the first option, with izip in Py2 or zip in Py3, read both lists once, and directly create the desired output with no intermediate lists. – PaulMcG Jul 05 '10 at 20:29
  • 1
    You can just use `from itertools import izip` and use that instead of `zip` in the first example. That creates an iterator, same as Python 3. – Chris B. Jul 05 '10 at 20:34
  • @Paul McGuire: You're right, I'm looping over the properties and applying some tests to figure out which objects are good. So in principle it would be possible to build the lists directly in that loop. This is also probably the fastest way. – fuenfundachtzig Jul 05 '10 at 21:20
35

I see 2 options.

  1. Using numpy:

    property_a = numpy.array([545., 656., 5.4, 33.])
    property_b = numpy.array([ 1.2,  1.3, 2.3, 0.3])
    good_objects = [True, False, False, True]
    good_indices = [0, 3]
    property_asel = property_a[good_objects]
    property_bsel = property_b[good_indices]
    
  2. Using a list comprehension and zip it:

    property_a = [545., 656., 5.4, 33.]
    property_b = [ 1.2,  1.3, 2.3, 0.3]
    good_objects = [True, False, False, True]
    good_indices = [0, 3]
    property_asel = [x for x, y in zip(property_a, good_objects) if y]
    property_bsel = [property_b[i] for i in good_indices]
    
Wolph
  • 78,177
  • 11
  • 137
  • 148
  • 2
    Using Numpy is a good suggestion since the OP seems to want to store numbers in lists. A two-dimensional array would be even better. – Philipp Jul 05 '10 at 13:35
  • It's also a good suggestion because this will be very familiar syntax to users of R, where this kind of selection is very powerful, especially when nested and/or multidimensional. – Thomas Browne May 25 '14 at 21:11
  • 1
    `[property_b[i] for i in good_indices]` is a good one for using without `numpy` – franchb Aug 08 '16 at 20:35
18

Use the built in function zip

property_asel = [a for (a, truth) in zip(property_a, good_objects) if truth]

EDIT

Just looking at the new features of 2.7. There is now a function in the itertools module which is similar to the above code.

http://docs.python.org/library/itertools.html#itertools.compress

itertools.compress('ABCDEF', [1,0,1,0,1,1]) =>
  A, C, E, F
Gary Kerr
  • 13,650
  • 4
  • 48
  • 51
  • 2
    I'm underwhelmed by the use of `itertools.compress` here. The list comprehension is *far* more readable, without having to dig up what the heck compress is doing. – PaulMcG Jul 05 '10 at 20:32
  • 5
    Hm, I find the code using compress much more readable :) Maybe I'm biased, because it does exactly what I want. – fuenfundachtzig Jul 09 '10 at 15:52
  • Why don't you provide an example with `itertools.compress` instead of copy pasting the documentation example? – Nicolas Gervais Sep 24 '20 at 12:42
11

Assuming you only have the list of items and a list of true/required indices, this should be the fastest:

property_asel = [ property_a[index] for index in good_indices ]

This means the property selection will only do as many rounds as there are true/required indices. If you have a lot of property lists that follow the rules of a single tags (true/false) list you can create an indices list using the same list comprehension principles:

good_indices = [ index for index, item in enumerate(good_objects) if item ]

This iterates through each item in good_objects (while remembering its index with enumerate) and returns only the indices where the item is true.


For anyone not getting the list comprehension, here is an English prose version with the code highlighted in bold:

list the index for every group of index, item that exists in an enumeration of good objects, if (where) the item is True

Eyrofire
  • 310
  • 2
  • 9
-2

Matlab and Scilab languages offer a simpler and more elegant syntax than Python for the question you're asking, so I think the best you can do is to mimic Matlab/Scilab by using the Numpy package in Python. By doing this the solution to your problem is very concise and elegant:

from numpy import *
property_a = array([545., 656., 5.4, 33.])
property_b = array([ 1.2,  1.3, 2.3, 0.3])
good_objects = [True, False, False, True]
good_indices = [0, 3]
property_asel = property_a[good_objects]
property_bsel = property_b[good_indices]

Numpy tries to mimic Matlab/Scilab but it comes at a cost: you need to declare every list with the keyword "array", something which will overload your script (this problem doesn't exist with Matlab/Scilab). Note that this solution is restricted to arrays of number, which is the case in your example.

FredAndre
  • 15
  • 3
  • 5
    Nowhere in the question does he mention NumPy -- there is no need to express your opinion on NumPy vs Matlab. Python lists are **not** the same thing as NumPy arrays, even if they both roughly correspond to vectors. (Python lists are like Matlab cell arrays -- each element can have a different data type. NumPy arrays are more restricted in order to enable certain optimizations). You can get similar syntax to your example via Python's built in `filter` or the external library `pandas`. If you're going to swap languages, you could also try R, but *that's not what the question is asking*. – Livius Jun 14 '14 at 22:39