Fastest method for extracting sub-list from Python list given array of indexes

Question

I have large Python list l of objects of any type, also I have another large list i (or even NumPy array) of integer indexes pointing to some elements within list l.

The question is what is the fastest (most efficient) way of creating another list l2 which contains elements of l with indexes from i.

The easiest way is to do a list comprehension:

l2 = [l[si] for si in i]
# Use np.nditer(i) instead of i, for NumPy array case

But is this the fastest possible way?

List comprehension is a Python loop, so might be slow for large lists, maybe there is some built-in Python's method in standard library written in efficient C to achieve just this task? Or maybe in NumPy there is such method for indexing Python's list by numpy array?

Maybe there is some simple and fast function in standard python library for doing this same to NumPy's np.take, like in imaginary code below:

import listtools
l2 = listtools.take(l, indexes)

exactly same question answered here - https://stackoverflow.com/questions/9008533/access-list-of-items-with-list-of-indices. there the suggestion was to use generator instead of list comprehension — Yossi Levi, Oct 03 '20 at 06:59
@YossiLevi I'm not convinced that generator is the fastest way, also for my task you need to get list back out of generator by doing `l2 = list(generator_())` which is another slow operation. Also in that question there's only one answer and I expect there are a dozen of other fast methods that were not mentioned. — Arty, Oct 03 '20 at 07:10
Lists can only be indexed one item 4or slice) at a time. And iterating on a list is faster than iterating on an array (and `nditer` does not help). So that basic list comprehension makes most sense. Remember it is only copying references. — hpaulj, Oct 03 '20 at 07:19
@hpaulj I wanted to find the fastest way for large list. My concern was that list comprehension is slow by itself as it is a python loop code. So I thought there could be some fast and simple special method in standard library, e.g. like `import listtools; l2 = listtools.take(l, indexes)`. — Arty, Oct 03 '20 at 07:35
@Arty. List comprehensions are quite a bit faster than general purpose loops, and Paul's answer shows the closest python has to `take`. — Mad Physicist, Oct 03 '20 at 08:43

score 4 · Accepted Answer · answered Oct 03 '20 at 08:37

4

You can get a minor speedup (~25% in the example below) by using operator.itemgetter which supports bulk lookup:

>>> import string
>>> import random
>>> import operator as op
>>> from timeit import timeit

# create random lists
>>> l = [random.choice([*string.ascii_letters,*range(100)]) for _ in range(1000000)]
>>> i = [random.randint(0,999999) for _ in range(300000)]

# timings
>>> timeit(lambda:[l[si] for si in i],number=100)
3.0997245000035036
>>> timeit(lambda:list(map(l.__getitem__,i)),number=100)
2.892384369013598
>>> timeit(lambda:list(op.itemgetter(*i)(l)),number=100)
2.1787672539940104

answered Oct 03 '20 at 08:37

Paul Panzer

51,835
3
54
99

Thanks! Interesting if there is also some improved solution for the case when indexes are NumPy array? – Arty Oct 03 '20 at 09:28
@Arty numpy and pure python don't mix well, so in this scenario the best I can think of is to actually convert the indices to list using `i.tolist()`. This will be faster but obviously not as fast as having a list of indices from the getgo. – Paul Panzer Oct 03 '20 at 09:37
I've decided to measure numpy usage, as I also was interested in solving task when we have indexes as numpy array, in this case we gain `1.56x` speedup, if also `l` is also a numpy array of objects then we gain `3x` speedup. [Here is my answer](https://stackoverflow.com/a/64183092/941531). – Arty Oct 03 '20 at 10:17
@Arty Interesting. Actually, giving or not giving the "dtype" kw makes a huge difference, especially if numpy decides it doesn't need `object` dtype, because it can use strings everywhere. This makes the final `tolist` much more expensive. But also if the difference is only `numpy` has to find out by itself that it has to use `object` that takes a significant amount of extra time. – Paul Panzer Oct 03 '20 at 10:57
Yes, it is the best to always provide `dtype` because in worst case numpy has to scan whole array to figure out the smallest type that can store any type of value of array. Then conversion will be even twice slower. Also it is always good to provide most narrow type, if you have integers do `np.int64` if also floats then `np.float64` if only strings then `np.str_` if any object then `np.object_`. The most narrow type will give best efficiency for all numpy algorithms. – Arty Oct 03 '20 at 11:17
I wonder how is efficient to pass 1 million positional arguments of function `op.itemgetter(...)` by unpacking `*i` operation. – Arty Oct 03 '20 at 11:31
@Arty This is probably special cased in the interpreter for the most important sequence types. – Paul Panzer Oct 03 '20 at 11:33
I did this with approx 700,000 indices and it was tremendously faster using op.itemgetter than list comprehension. A numpy array would not fit into memory because the list contained strings of varying length, so numpy array indexing was not possible. – John Aug 12 '21 at 14:22

Arty · Answer 2 · 2020-10-03T10:21:56.240

It is known that NumPy arrays can also be used to store and process any arbitrary Python objects through dtype = np.object_.

So I decided to measure NumPy usage speed compared to plain python. Also as I mentioned in my question I also want to solve the case when indexes is numpy array of integers.

Next code measures different cases, whether we need to convert or not source lists to numpy arrays and whether result should be converted too.

Try it online!

import string
from timeit import timeit
import numpy as np
np.random.seed(0)

letters = np.array(list(string.ascii_letters), dtype = np.object_)
nl = letters[np.random.randint(0, len(letters), size = (10 ** 6,))]
l = nl.tolist()
ni = np.random.permutation(np.arange(nl.size, dtype = np.int64))
i = ni.tolist()

pyt = timeit(lambda: [l[si] for si in i], number = 10)
print('python:', round(pyt, 3), flush = True)

for l_from_list in [True, False]:
    for i_from_list in [True, False]:
        for l_to_list in [True, False]:
            def Do():
                cl = np.array(l, dtype = np.object_) if l_from_list else nl
                ci = np.array(i, dtype = np.int64) if i_from_list else ni
                res = cl[ci]
                res = res.tolist() if l_to_list else res
                return res
            ct = timeit(lambda: Do(), number = 10)
            print(
                'numpy:', 'l_from_list', l_from_list, 'i_from_list', i_from_list, 'l_to_list', l_to_list,
                'time', round(ct, 3), 'speedup', round(pyt / ct, 2), flush = True
            )

outputs:

python: 2.279
numpy: l_from_list True  i_from_list True  l_to_list True  time 2.924 speedup 0.78
numpy: l_from_list True  i_from_list True  l_to_list False time 2.805 speedup 0.81
numpy: l_from_list True  i_from_list False l_to_list True  time 1.457 speedup 1.56
numpy: l_from_list True  i_from_list False l_to_list False time 1.312 speedup 1.74
numpy: l_from_list False i_from_list True  l_to_list True  time 2.352 speedup 0.97
numpy: l_from_list False i_from_list True  l_to_list False time 2.209 speedup 1.03
numpy: l_from_list False i_from_list False l_to_list True  time 0.894 speedup 2.55
numpy: l_from_list False i_from_list False l_to_list False time 0.75  speedup 3.04

So we can see that if we store all lists as numpy arrays then we gain 3x speedup! But if only indexes is a numpy array then we get speedup of just 1.56x which is also very good. In the case when everything has to be converted from lists there and back, then we gain speedup of 0.78x, meaning we slow down, hence if we work with lists only than indexing through numpy is not helpful.

Couple of things: no need for `lambda: Do()`: you can just use `Do` (no parentheses). `Do` itself is doing a bunch of extra work with decisions and copying that it probably shouldn't be. Suggestion: use the fact that `timeit` accepts a string of semicolon-separated statements to build a command rather than doing decisions at runtime. — Mad Physicist, Oct 03 '20 at 15:53
Two more things: in the interested of fairness, use `operator.itemgerter` as the other answer suggests, and try the benchmark for a few different sizes, like 100, 1k, 10k, 100k or so — Mad Physicist, Oct 03 '20 at 15:55
Indexing like this is one area where object dtype arrays outshine lists. Indexing doesn't care what the elements represent, their dtype. All it has to pay attention to is the strides, the number of bytes it has to step in each dimension. — hpaulj, Oct 04 '20 at 05:46
@MadPhysicist I think that inside any function that is measured through `timeit` it is quite alright to make any `if` decisions as long as they use very tiny amount of whole one function run. Especially in the case when code becomes more easy and readable. Of cause some heavy-computational code that doesn't change behavior of function being timed should be placed outside somewhere like in timeit's initialization argument. — Arty, Oct 04 '20 at 06:18
BTW, after trying a bit I enjoyed using numpy's arrays of object_ instead of pure python list for those cases when performance is needed, very convenient things. — Arty, Oct 04 '20 at 06:20

Fastest method for extracting sub-list from Python list given array of indexes

2 Answers2