1

I'm trying to speed up a program I've written, and after importing cProfile, I see that one function takes up a massive bit of computation time.

It's this, which finds an numpy.ndarray in a list:

    def locate(arr, l ):
        for i in range(len(l)):
            if np.all(l[i] == arr):
                return i
        return -1

As the list can't be ordered etc, I can't see any way to avoid scanning the entire list. I have read some pieces on vectorisation, and I wanted to know if that could be applied here, or if there's any other way to speed this up?

Thanks

kmario23
  • 57,311
  • 13
  • 161
  • 150
cucumber
  • 35
  • 1
  • 3
  • Shouldn't it be `np.any(l[i] == arr)` instead of `np.all(l[i] == arr)`? – kmario23 Mar 30 '19 at 01:01
  • give us some idea of the dimensions. Is `arr` the array, and `l` a list of matching shape arrays? – hpaulj Mar 30 '19 at 01:32
  • Looking at the answers I think you need to give a working example. It's too easy to make assumptions that might not apply. – hpaulj Mar 30 '19 at 01:44
  • Your code short-circuits, returning as soon as it finds a match. It will scan the whole list is there isn't a match. So timing depends a lot on where the find is, early or late. Searches that treat `l` as an array (1 higher dimension than `arr`) will scan the whole array (more than once). They can be faster, if the find is late. – hpaulj Mar 30 '19 at 02:17

4 Answers4

2

You probably cannot avoid walking the list but you can speed up the comparison:

Set up example:

L  = list(np.floor(np.outer(*2*(np.linspace(1,10,1000),))))
arr = L[537]

Direct method for reference:

import itertools as it

next(it.chain((i for i, a in enumerate(L) if np.all(arr==a)), (-1,)))
# 537
timeit(lambda: next(it.chain((i for i, a in enumerate(L) if np.all(arr==a)), (-1,))), number=100)
# 0.27100146701559424

Approach 1: Use np.array_equal (slower)

next(it.chain((i for i, a in enumerate(L) if np.array_equal(arr, a)), (-1,)))
# 537
timeit(lambda: next(it.chain((i for i, a in enumerate(L) if np.array_equal(arr, a)), (-1,))), number=100)
# 0.2992244770284742

Approach 2: Use void view (faster)

arr_v = arr.reshape(-1).view(f'V{arr.itemsize*arr.size}')

next(it.chain((i for i, a in enumerate(L) if arr_v==a.reshape(-1).view(f'V{a.itemsize*a.size}')), (-1,)))
# 537
timeit(lambda: next(it.chain((i for i, a in enumerate(L) if arr_v==a.reshape(-1).view(f'V{a.itemsize*a.size}')), (-1,))), number=100)
# 0.11853155982680619
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99
  • Thanks. I'm trying to implement the void view approach but I'm a bit confused. As I understand it in your example, L is my list of numpy arrays and arr is the numpy array i'm searching for. However I get an error for a.reshape that 'list' object has no attribute 'reshape', even though I assumed a would be a numpy array. Am I understanding it wrong? – cucumber Mar 30 '19 at 11:32
  • @cucumber Nope, you are understanding it right. `L` is supposed to be a list of arrays as you say. Your getting that error suggests that you are using an `L` that does somehow contain lists? – Paul Panzer Mar 30 '19 at 11:43
1

There is a built-in python function called index() where you can use it by plugging a string in as the value and finding its index in the list.

Ryan Wans
  • 28
  • 5
0

So are you looking for np.where

temp_list=np.array(temp_list)
np.where(temp_list==5)
(array([1, 3, 6, 8]),)
BENY
  • 317,841
  • 20
  • 164
  • 234
0

How to Check if a Matrix is in a List of Matrices Python

Here the accepted answer uses np.array_equal which first checks shape, then does the all(==) test.

Another SO: Check if 2d array exists in 3d array in Python?

Searching an array for a value faster than np.where(ar==value) using fortran and f2py

Why isn't "numpy.any" lazy (short-circuiting)

hpaulj
  • 221,503
  • 14
  • 230
  • 353