6

I have a 2D numpy array S representing a state space, with 80000000 rows (as states) and 5 columns (as state variables).

I initialize K0 with S, and at each iteration, I apply a state transition function f(x) on all of the states in Ki, and delete states whose f(x) is not in Ki, resulting Ki+1. Until it converges i.e. Ki+1 = Ki.

Going like this would take ages:

K = S
to_delete = [0]
While to_delete:
    to_delete = []
    for i in xrange(len(K)):
        if not f(i) in K:
        to_delete.append(K(i))
    K = delete(K,to_delete,0)

So I wanted to make a vectorized implementation :

slice K in columns, apply f and, join them once again, thus obtaining f(K) somehow.

The question now is how to get an array of length len(K), say Sel, where each row Sel[i] determine whether f(K[i]) is in K. Exactly like the function in1d works.

Then it would be simple to make

K=K[Sel]]
Alex Riley
  • 169,130
  • 45
  • 262
  • 238
amine23
  • 327
  • 3
  • 10

3 Answers3

5

Your question is difficult to understand because it contains extraneous information and contains typos. If I understand correctly, you simply want an efficient way to perform a set operation on the rows of a 2D array (in this case the intersection of the rows of K and f(K)).

You can do this with numpy.in1d if you create structured array view.

Code:

if this is K:

In [50]: k
Out[50]:
array([[6, 6],
       [3, 7],
       [7, 5],
       [7, 3],
       [1, 3],
       [1, 5],
       [7, 6],
       [3, 8],
       [6, 1],
       [6, 0]])

and this is f(K) (for this example I subtract 1 from the first col and add 1 to the second):

In [51]: k2
Out[51]:
array([[5, 7],
       [2, 8],
       [6, 6],
       [6, 4],
       [0, 4],
       [0, 6],
       [6, 7],
       [2, 9],
       [5, 2],
       [5, 1]])

then you can find all rows in K also found in f(K) by doing something this:

In [55]: k[np.in1d(k.view(dtype='i,i').reshape(k.shape[0]),k2.view(dtype='i,i').
reshape(k2.shape[0]))]
Out[55]: array([[6, 6]])

view and reshape create flat structured views so that each row appears as a single element to in1d. in1d creates a boolean index of k of matched items which is used to fancy index k and return the filtered array.

Paul
  • 42,322
  • 15
  • 106
  • 123
  • 1
    Great! but what is `dtype = 'i,i'` ? Could this method work if my rows are of any type and any number of columns? for instance rows like `[-0.5,0.5,1,-6,20]`. – amine23 Apr 26 '13 at 16:27
  • 1
    @amine23 I've put a link to structured array documentation in my answer. Yes, floats, strings, booleans are all allowed fields in a dtype struct. – Paul Apr 26 '13 at 17:27
  • 2
    I get `ValueError: total size of new array must be unchanged` if I try your example. Haven't sorted out what's changed to break it (I have numpy 1.8.2, python 2.7.6). – Kyle Apr 22 '16 at 09:53
  • Great solution. I had to use dtype='int,int' in my case. Otherwise the structured array was 2x the length of the original [N,2] array. – Andrew Oct 17 '19 at 14:08
1

Not sure if I understand your question entirely, but if the interpretation of Paul is correct, it can be solved efficiently and fully vectorized using the numpy_indexed package as such in a single readable line:

import numpy_indexed as npi
K = npi.intersection(K, f(K))

Also, this works for rows of any type or shape.

Eelco Hoogendoorn
  • 10,459
  • 1
  • 44
  • 42
0

Above answer is great.

But if one doesn't want to mingle with structured arrays and wants a solution that doesn't care what is the type of your array, nor the dimensions of your array elements, I came up with this :

k[np.in1d(list(map(np.ndarray.dumps, k)), list(map(np.ndarray.dumps, k2)))]

basically, list(map(np.ndarray.dumps, k))instead of k.view(dtype='f8,f8').reshape(k.shape[0]).

Take into account that this solution is ~50 times slower.

k = np.array([[6.5, 6.5],
       [3.5, 7.5],
       [7.5, 5.5],
       [7.5, 3.5],
       [1.5, 3.5],
       [1.5, 5.5],
       [7.5, 6.5],
       [3.5, 8.5],
       [6.5, 1.5],
       [6.5, 0.5]])
k = np.tile(k, (1000, 1))

k2 = np.c_[k[:, 0] - 1, k[:, 1] + 1]


In [132]: k.shape, k2.shape
Out[132]: ((10000, 2), (10000, 2))

In [133]: timeit k[np.in1d(k.view(dtype='f8,f8').reshape(k.shape[0]),k2.view(dtype='f8,f8').reshape(k2.shape[0]))]
10 loops, best of 3: 22.2 ms per loop

In [134]: timeit k[np.in1d(list(map(np.ndarray.dumps, k)), list(map(np.ndarray.dumps, k2)))]
1 loop, best of 3: 892 ms per loop

It can be marginal for small inputs, but for the op's, it will take 1h 20min instead of 2 min.

Jacquot
  • 1,750
  • 15
  • 25