Python: Select subset from list based on index set

Question

I have several lists having all the same number of entries (each specifying an object property):

property_a = [545., 656., 5.4, 33.]
property_b = [ 1.2,  1.3, 2.3, 0.3]
...

and list with flags of the same length

good_objects = [True, False, False, True]

(which could easily be substituted with an equivalent index list:

good_indices = [0, 3]

What is the easiest way to generate new lists property_asel, property_bsel, ... which contain only the values indicated either by the True entries or the indices?

property_asel = [545., 33.]
property_bsel = [ 1.2, 0.3]

score 177 · Accepted Answer · edited Dec 16 '19 at 07:59

177

You could just use list comprehension:

property_asel = [val for is_good, val in zip(good_objects, property_a) if is_good]

or

property_asel = [property_a[i] for i in good_indices]

The latter one is faster because there are fewer good_indices than the length of property_a, assuming good_indices are precomputed instead of generated on-the-fly.

Edit: The first option is equivalent to itertools.compress available since Python 2.7/3.1. See @Gary Kerr's answer.

property_asel = list(itertools.compress(property_a, good_objects))

edited Dec 16 '19 at 07:59

Devin

55
5

answered Jul 05 '10 at 11:32

kennytm

510,854
105
1,084
1,005

1

@fuen: Yes. Causes a lot on Python 2 (use [itertools.izip](http://docs.python.org/library/itertools.html#itertools.izip) instead), not so much on Python 3. This is because the `zip` in Python 2 will create a new list, but on Python 3 it will just return a (lazy) generator. – kennytm Jul 05 '10 at 11:37
OK, so I should stick to your 2nd proposal then, because this makes up the central part of my code. – fuenfundachtzig Jul 05 '10 at 11:39
4

@85: why are you worrying about performance? Write what you have to do, if it is slow, then test to find bottlenecks. – Gary Kerr Jul 05 '10 at 11:39
1

@PreludeAndFugue: If there are two equivalent options it's good to know which one is faster, and use that one right away. – fuenfundachtzig Jul 05 '10 at 11:42
I suspect the second is *slower*, because where did that good_indices list come from in the first place? Probably by enumerating over all of good_objects and saving the indices where good_objects[i] is True. So no savings after all, plus you had to build a second list. Use the first option, with izip in Py2 or zip in Py3, read both lists once, and directly create the desired output with no intermediate lists. – PaulMcG Jul 05 '10 at 20:29
1

You can just use `from itertools import izip` and use that instead of `zip` in the first example. That creates an iterator, same as Python 3. – Chris B. Jul 05 '10 at 20:34
@Paul McGuire: You're right, I'm looping over the properties and applying some tests to figure out which objects are good. So in principle it would be possible to build the lists directly in that loop. This is also probably the fastest way. – fuenfundachtzig Jul 05 '10 at 21:20

score 35 · Answer 2 · answered Jul 05 '10 at 11:34

35

I see 2 options.

Using numpy:

property_a = numpy.array([545., 656., 5.4, 33.])
property_b = numpy.array([ 1.2,  1.3, 2.3, 0.3])
good_objects = [True, False, False, True]
good_indices = [0, 3]
property_asel = property_a[good_objects]
property_bsel = property_b[good_indices]

Using a list comprehension and zip it:

property_a = [545., 656., 5.4, 33.]
property_b = [ 1.2,  1.3, 2.3, 0.3]
good_objects = [True, False, False, True]
good_indices = [0, 3]
property_asel = [x for x, y in zip(property_a, good_objects) if y]
property_bsel = [property_b[i] for i in good_indices]

answered Jul 05 '10 at 11:34

Wolph

78,177
11
137
148

2

Using Numpy is a good suggestion since the OP seems to want to store numbers in lists. A two-dimensional array would be even better. – Philipp Jul 05 '10 at 13:35
It's also a good suggestion because this will be very familiar syntax to users of R, where this kind of selection is very powerful, especially when nested and/or multidimensional. – Thomas Browne May 25 '14 at 21:11
1

`[property_b[i] for i in good_indices]` is a good one for using without `numpy` – franchb Aug 08 '16 at 20:35

Gary Kerr · Answer 3 · 2010-07-05T14:14:55.813

18

Use the built in function zip

property_asel = [a for (a, truth) in zip(property_a, good_objects) if truth]

EDIT

Just looking at the new features of 2.7. There is now a function in the itertools module which is similar to the above code.

http://docs.python.org/library/itertools.html#itertools.compress

itertools.compress('ABCDEF', [1,0,1,0,1,1]) =>
  A, C, E, F

edited Jul 05 '10 at 14:14

answered Jul 05 '10 at 11:34

Gary Kerr

13,650
4
48
51

2

I'm underwhelmed by the use of `itertools.compress` here. The list comprehension is *far* more readable, without having to dig up what the heck compress is doing. – PaulMcG Jul 05 '10 at 20:32
5

Hm, I find the code using compress much more readable :) Maybe I'm biased, because it does exactly what I want. – fuenfundachtzig Jul 09 '10 at 15:52
Why don't you provide an example with `itertools.compress` instead of copy pasting the documentation example? – Nicolas Gervais Sep 24 '20 at 12:42

score 11 · Answer 4 · answered Mar 11 '14 at 16:54

Assuming you only have the list of items and a list of true/required indices, this should be the fastest:

property_asel = [ property_a[index] for index in good_indices ]

This means the property selection will only do as many rounds as there are true/required indices. If you have a lot of property lists that follow the rules of a single tags (true/false) list you can create an indices list using the same list comprehension principles:

good_indices = [ index for index, item in enumerate(good_objects) if item ]

This iterates through each item in good_objects (while remembering its index with enumerate) and returns only the indices where the item is true.

For anyone not getting the list comprehension, here is an English prose version with the code highlighted in bold:

list the index for every group of index, item that exists in an enumeration of good objects, if (where) the item is True

FredAndre · Answer 5 · 2014-06-15T17:40:20.817

Matlab and Scilab languages offer a simpler and more elegant syntax than Python for the question you're asking, so I think the best you can do is to mimic Matlab/Scilab by using the Numpy package in Python. By doing this the solution to your problem is very concise and elegant:

from numpy import *
property_a = array([545., 656., 5.4, 33.])
property_b = array([ 1.2,  1.3, 2.3, 0.3])
good_objects = [True, False, False, True]
good_indices = [0, 3]
property_asel = property_a[good_objects]
property_bsel = property_b[good_indices]

Numpy tries to mimic Matlab/Scilab but it comes at a cost: you need to declare every list with the keyword "array", something which will overload your script (this problem doesn't exist with Matlab/Scilab). Note that this solution is restricted to arrays of number, which is the case in your example.

Nowhere in the question does he mention NumPy -- there is no need to express your opinion on NumPy vs Matlab. Python lists are **not** the same thing as NumPy arrays, even if they both roughly correspond to vectors. (Python lists are like Matlab cell arrays -- each element can have a different data type. NumPy arrays are more restricted in order to enable certain optimizations). You can get similar syntax to your example via Python's built in `filter` or the external library `pandas`. If you're going to swap languages, you could also try R, but *that's not what the question is asking*. — Livius, Jun 14 '14 at 22:39

Python: Select subset from list based on index set

5 Answers5

EDIT

Linked