337

I need to choose some elements from the given list, knowing their index. Let say I would like to create a new list, which contains element with index 1, 2, 5, from given list [-2, 1, 5, 3, 8, 5, 6]. What I did is:

a = [-2,1,5,3,8,5,6]
b = [1,2,5]
c = [ a[i] for i in b]

Is there any better way to do it? something like c = a[b] ?

TerryA
  • 58,805
  • 11
  • 114
  • 143
hoang tran
  • 3,918
  • 3
  • 19
  • 21
  • 1
    by the way, I found another solution here. I haven't test it yet, but I think I can post it here once you are interested in http://code.activestate.com/recipes/577953-get-multiple-elements-from-a-list/ – hoang tran Aug 16 '13 at 12:44
  • That is the same solution as mentioned in the question, but wrapped in a `lambda` function. – Will Jun 26 '17 at 17:02

11 Answers11

323

You can use operator.itemgetter:

from operator import itemgetter 
a = [-2, 1, 5, 3, 8, 5, 6]
b = [1, 2, 5]
print(itemgetter(*b)(a))
# Result:
(1, 5, 5)

Or you can use numpy:

import numpy as np
a = np.array([-2, 1, 5, 3, 8, 5, 6])
b = [1, 2, 5]
print(list(a[b]))
# Result:
[1, 5, 5]

But really, your current solution is fine. It's probably the neatest out of all of them.

TerryA
  • 58,805
  • 11
  • 114
  • 143
  • 58
    +1 for mentioning that `c = [a[i] for i in b]` is perfectly fine. Note that the `itemgetter` solution will not do the same thing if b has less than 2 elements. – flornquake Aug 16 '13 at 11:35
  • 1
    **Side** **Note**: Using _itemgetter_ while working in multi-process doesn't work. Numpy works great in multi-process. – Lior Magen Mar 29 '16 at 10:26
  • 3
    Additional comment, `a[b]` works **only** when `a` is a **numpy** array, i.e. you create it with a numpy function. – Ludwig Zhou Aug 07 '17 at 09:11
  • 2
    I have benchmarked the non numpy options and itemgetter appears to be the fastest, even slightly faster than simply typing out the desired indexes inside parentheses, using Python 3.44 – ragardner Oct 16 '17 at 09:42
  • @citizen2077, can you give an example of the syntax you describe? – alancalvitti Jan 07 '19 at 19:05
65

Alternatives:

>>> map(a.__getitem__, b)
[1, 5, 5]

>>> import operator
>>> operator.itemgetter(*b)(a)
(1, 5, 5)
falsetru
  • 357,413
  • 63
  • 732
  • 636
  • The problem w/ the first one is that `__getitem__` doesn't seem to be compasable eg how to map the type of the item? `map(type(a.__getitem__), b) ` – alancalvitti Jan 07 '19 at 19:17
  • @alancalvitti, `lambda x: type(a.__getitem__(x)), b`. In this case using `[..]` is more compact: `lambda x: type(a[x]), b` – falsetru Jan 08 '19 at 00:01
  • 2
    just convert back into a list: `list(map(a.__getitem__, b))` – tv87 Mar 21 '21 at 19:09
  • How can I use the same method for indices stored in a 2D list ? For example , I have `main_arr =[27.5, 31.0, 29.8, 29.8, 32.3, 34.4, 28.8, 31.0, 32.2, 26.0, 29.4, 31.0, 29.3, 29.3, 30.9, 30.7, 29.9, 29.6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 56.6, 0.0, 0.0, 0.0, 0.0]` , and I want to get values at indices given by `pixels = [[2,5,8,11,14,17], [1,4,7,10,13,16], [0,3,6,9,12,14]]` . One way I can think of is calling your method in the loop. But is there a more elegant way ? – McLovin Aug 08 '22 at 20:29
  • @Solen'ya, `[[main_arr[p] for p in ps] for ps in pixels]` or `[operator.itemgetter(*ps)(main_arr) for ps in pixels]` – falsetru Aug 09 '22 at 00:08
  • This is what I ended up doing but I was asking if there is a better way . – McLovin Aug 09 '22 at 00:09
15

Another solution could be via pandas Series:

import pandas as pd

a = pd.Series([-2, 1, 5, 3, 8, 5, 6])
b = [1, 2, 5]
c = a[b]

You can then convert c back to a list if you want:

c = list(c)
BossaNova
  • 1,509
  • 1
  • 13
  • 17
11

Basic and not very extensive testing comparing the execution time of the five supplied answers:

def numpyIndexValues(a, b):
    na = np.array(a)
    nb = np.array(b)
    out = list(na[nb])
    return out

def mapIndexValues(a, b):
    out = map(a.__getitem__, b)
    return list(out)

def getIndexValues(a, b):
    out = operator.itemgetter(*b)(a)
    return out

def pythonLoopOverlap(a, b):
    c = [ a[i] for i in b]
    return c

multipleListItemValues = lambda searchList, ind: [searchList[i] for i in ind]

using the following input:

a = range(0, 10000000)
b = range(500, 500000)

simple python loop was the quickest with lambda operation a close second, mapIndexValues and getIndexValues were consistently pretty similar with numpy method significantly slower after converting lists to numpy arrays.If data is already in numpy arrays the numpyIndexValues method with the numpy.array conversion removed is quickest.

numpyIndexValues -> time:1.38940598 (when converted the lists to numpy arrays)
numpyIndexValues -> time:0.0193445 (using numpy array instead of python list as input, and conversion code removed)
mapIndexValues -> time:0.06477512099999999
getIndexValues -> time:0.06391049500000001
multipleListItemValues -> time:0.043773591
pythonLoopOverlap -> time:0.043021754999999995
Don Smythe
  • 9,234
  • 14
  • 62
  • 105
  • I do not know what Python interpreter you use but the first method `numpyIndexValues` does not work since `a`, `b` are of type `range`. I am guessing that you ment to convert `a`, `b` to `numpy.ndarrays` first? – strpeter Oct 14 '15 at 08:21
  • 1
    @strpeter Yes I was wasn't comparing apples with apples, I had created numpy arrays as input in the test case for the numpyIndexValues. I have fixed this now and all use the same lists as input. – Don Smythe Oct 18 '15 at 05:23
3

Here's a simpler way:

a = [-2,1,5,3,8,5,6]
b = [1,2,5]
c = [e for i, e in enumerate(a) if i in b]
Max Sirwa
  • 151
  • 1
  • 3
  • 7
    The OP way of `[a[i] for i in b]` is simpler than what you suggest. – Muhammad Yasirroni Feb 13 '22 at 12:45
  • 1
    I wonder how is that *"simpler"*? You are iterating over ***all*** elements of `a`, checking if their index is in `b` and add them. On the other hand the code in the question simply takes the elements from `a` which are at the indexes in `b`. Sounds simpler to me... – Tomerikoo Jun 09 '22 at 10:34
3

List comprehension is clearly the most immediate and easiest to remember - in addition to being quite pythonic!

In any case, among the proposed solutions, it is not the fastest (I have run my test on Windows using Python 3.8.3):

import timeit
from itertools import compress
import random
from operator import itemgetter
import pandas as pd

__N_TESTS__ = 10_000

vector = [str(x) for x in range(100)]
filter_indeces = sorted(random.sample(range(100), 10))
filter_boolean = random.choices([True, False], k=100)

# Different ways for selecting elements given indeces

# list comprehension
def f1(v, f):
   return [v[i] for i in filter_indeces]

# itemgetter
def f2(v, f):
   return itemgetter(*f)(v)

# using pandas.Series
# this is immensely slow
def f3(v, f):
   return list(pd.Series(v)[f])

# using map and __getitem__
def f4(v, f):
   return list(map(v.__getitem__, f))

# using enumerate!
def f5(v, f):
   return [x for i, x in enumerate(v) if i in f]

# using numpy array
def f6(v, f):
   return list(np.array(v)[f])

print("{:30s}:{:f} secs".format("List comprehension", timeit.timeit(lambda:f1(vector, filter_indeces), number=__N_TESTS__)))
print("{:30s}:{:f} secs".format("Operator.itemgetter", timeit.timeit(lambda:f2(vector, filter_indeces), number=__N_TESTS__)))
print("{:30s}:{:f} secs".format("Using Pandas series", timeit.timeit(lambda:f3(vector, filter_indeces), number=__N_TESTS__)))
print("{:30s}:{:f} secs".format("Using map and __getitem__", timeit.timeit(lambda: f4(vector, filter_indeces), number=__N_TESTS__)))
print("{:30s}:{:f} secs".format("Enumeration (Why anyway?)", timeit.timeit(lambda: f5(vector, filter_indeces), number=__N_TESTS__)))

My results are:

List comprehension :0.007113 secs
Operator.itemgetter :0.003247 secs
Using Pandas series :2.977286 secs
Using map and getitem :0.005029 secs
Enumeration (Why anyway?) :0.135156 secs
Numpy :0.157018 secs

nikeros
  • 3,302
  • 2
  • 10
  • 26
2

I'm sure this has already been considered: If the amount of indices in b is small and constant, one could just write the result like:

c = [a[b[0]]] + [a[b[1]]] + [a[b[2]]]

Or even simpler if the indices itself are constants...

c = [a[1]] + [a[2]] + [a[5]]

Or if there is a consecutive range of indices...

c = a[1:3] + [a[5]]
ecp
  • 2,199
  • 1
  • 12
  • 11
1

Static indexes and small list?

Don't forget that if the list is small and the indexes don't change, as in your example, sometimes the best thing is to use sequence unpacking:

_,a1,a2,_,_,a3,_ = a

The performance is much better and you can also save one line of code:

 %timeit _,a1,b1,_,_,c1,_ = a
10000000 loops, best of 3: 154 ns per loop 
%timeit itemgetter(*b)(a)
1000000 loops, best of 3: 753 ns per loop
 %timeit [ a[i] for i in b]
1000000 loops, best of 3: 777 ns per loop
 %timeit map(a.__getitem__, b)
1000000 loops, best of 3: 1.42 µs per loop
G M
  • 20,759
  • 10
  • 81
  • 84
0

The results for the latest pandas==1.4.2 as of June 2022 are as follows.

Note that simple slicing is no longer possible and benchmark results are faster.

import timeit
import pandas as pd
print(pd.__version__)
# 1.4.2

pd.Series([-2, 1, 5, 3, 8, 5, 6])[1, 2, 5]
# KeyError: 'key of type tuple not found and not a MultiIndex'

pd.Series([-2, 1, 5, 3, 8, 5, 6]).iloc[[1, 2, 5]].tolist()
# [1, 5, 5]

def extract_multiple_elements():
    return pd.Series([-2, 1, 5, 3, 8, 5, 6]).iloc[[1, 2, 5]].tolist()

__N_TESTS__ = 10_000
t1 = timeit.timeit(extract_multiple_elements, number=__N_TESTS__)
print(round(t1, 3), 'seconds')
# 1.035 seconds
Keiku
  • 8,205
  • 4
  • 41
  • 44
-1

My answer does not use numpy or python collections.

One trivial way to find elements would be as follows:

a = [-2, 1, 5, 3, 8, 5, 6]
b = [1, 2, 5]
c = [i for i in a if i in b]

Drawback: This method may not work for larger lists. Using numpy is recommended for larger lists.

falsetru
  • 357,413
  • 63
  • 732
  • 636
  • 8
    No need to iterate `a`. `[a[i] for i in b]` – falsetru Sep 22 '14 at 12:38
  • 3
    This method doesn't even work in any other case. What if `a` had another 5 in it? – TerryA Jul 21 '15 at 21:47
  • IMO, faster to do this sort of intersection using [sets](https://docs.python.org/3/tutorial/datastructures.html#sets) – sirgogo Mar 15 '17 at 21:13
  • If you are worried about IndexErrors if b has numbers that exceed a's size, try `[a[i] if i – 576i Aug 09 '18 at 09:23
  • 2
    This doesn't answer the question. It is not even what was asked for. `b` is a list of ***indexes*** to take from `a`, not elements. You are simply taking the ***elements*** in `a` which also exist in `b`. Again, not what is asked for... – Tomerikoo Mar 06 '22 at 10:15
  • This post should be closed, the reason is explained by @Tomerikoo – FLAK-ZOSO Jun 09 '22 at 10:28
-1

Kind of pythonic way:

c = [x for x in a if a.index(x) in b]
  • 5
    I would say this is less "pythonic" than even the OP's example -- you've managed to turn their `O(n)` solution into an `O(n^2)` solution while also nearly doubling the length of the code. You will also want to note that approach will fail if the list contains objects will fuzzy or partial equality, e.g. if `a` contains `float('nan')`, this will **always** raise a `ValueError`. – Brian61354270 Mar 26 '20 at 19:30
  • This will give wrong results if `a` has duplicate items (`index` returns the index of the ***first*** occurrence of the element) – Tomerikoo Mar 06 '22 at 10:16