3

I have a regular list called a, and a NumPy array of indices b.
(No, it is not possible for me to convert a to a NumPy array.)

Is there any way for me to the same effect as "a[b]" efficiently? To be clear, this implies that I don't want to extract every individual int in b due to its performance implications.

(Yes, this is a bottleneck in my code. That's why I'm using NumPy arrays to begin with.)

user541686
  • 205,094
  • 128
  • 528
  • 886
  • 1
    I don't know how much faster (if at all) `operator.itemgetter()` would be. – Ignacio Vazquez-Abrams Jul 31 '16 at 00:18
  • 2
    What is your plan for (what would be) `a[b]`? It's hard to imagine a use for it that won't "extract an individual `int` for ever integer in `b`"... eventually. If you're concerned about wasting space having a list and a sublist hanging around simultaneously, it seems that you could iterate (or whatever) over `b` at the time of need, instead of the (would be) `a[b]`. – jedwards Jul 31 '16 at 00:33
  • @jedwards: My sentence was a little ambiguous (fixed), but what I was say was that I'm trying to avoid extracting the individual elements of `b` (I don't need it and it slows my code down). I'm using the extracted elements of `a` afterward (e.g. looking at which ones are `None`, etc... it's not really relevant) but that hardly implies I need to extract `b`'s elements manually in the process. – user541686 Jul 31 '16 at 00:43
  • `itemgetter` creates a class instance with a callable. I think its speed improvement comes from a simpler interpretation or calling stack. It does not use any special C code (that I can see). I get, at best, a 2x speed improvement. – hpaulj Jul 31 '16 at 00:59
  • For me (Python3, Numpy 1.10.4) -- `itemgetter` seems the fastest, a list comprehension over `b` being ~56% slower, and a comprehension over `np.nditer(b)` being ~134% slower. – jedwards Jul 31 '16 at 01:00
  • `nditer` isn't a speed help, even when used for real numpy cases. It's a stepping stone toward compiled C code (e.g. via Cython). – hpaulj Jul 31 '16 at 01:26

2 Answers2

3
a = list(range(1000000))
b = np.random.randint(0, len(a), 10000)

%timeit np.array(a)[b]
10 loops, best of 3: 84.8 ms per loop

%timeit [a[x] for x in b]
100 loops, best of 3: 2.93 ms per loop

%timeit operator.itemgetter(*b)(a)
1000 loops, best of 3: 1.86 ms per loop

%timeit np.take(a, b)
10 loops, best of 3: 91.3 ms per loop

I had high hopes for numpy.take() but it is far from optimal. I tried some Numba solutions as well, and they yielded similar times--around 92 ms.

So a simple list comprehension is not far from the best here, but operator.itemgetter() wins, at least for input sizes at these orders of magnitude.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
3

Write a cython function:

import cython
from cpython cimport PyList_New, PyList_SET_ITEM, Py_INCREF

@cython.wraparound(False)
@cython.boundscheck(False)
def take(list alist, Py_ssize_t[:] arr):
    cdef:
        Py_ssize_t i, idx, n = arr.shape[0]
        list res = PyList_New(n)
        object obj

    for i in range(n):
        idx = arr[i]
        obj = alist[idx]
        PyList_SET_ITEM(res, i, alist[idx])
        Py_INCREF(obj)

    return res

The result of %timeit:

import numpy as np

al= list(range(10000))
aa = np.array(al)

ba = np.random.randint(0, len(a), 10000)
bl = ba.tolist()

%timeit [al[i] for i in bl]
%timeit np.take(aa, ba)
%timeit take(al, ba)

1000 loops, best of 3: 1.68 ms per loop
10000 loops, best of 3: 51.4 µs per loop
1000 loops, best of 3: 254 µs per loop

numpy.take() is the fastest if both of the arguments are ndarray object. The cython version is 5x faster than list comprehension.

HYRY
  • 94,853
  • 25
  • 187
  • 187
  • Nice. Now I wonder why my Numba attempts were so slow. If you know Numba can you try to make a Numba function that's faster than the list comprehension? I couldn't. – John Zwinck Jul 31 '16 at 01:49
  • 1
    @JohnZwinck, numba make a copy of the list internal : http://numba.pydata.org/numba-doc/dev/reference/pysupported.html#list – HYRY Aug 01 '16 at 11:05