How to efficiently split a pair of arrays based on a condition on one of them

Question

I have two 2D numpy arrays - real r, which contains points in space, given by their Cartesian coordinates, and v, a complex vector defined at each of these points. I would like to split both of these arrays, based on some condition on r.

e.g., r1 contains all points with the first cartesian coordinate is positive, and v1 gives the corresponding values of v. All other points and their corresponding vectors go into .

Based on this question, and the fact that zip is essentially it's own inverse, I currently have the following solution:

r1, v1 = zip(*[rv for rv in zip(r, v) if rv[0][0] > 0.0])
r2, v2 = zip(*[rv for rv in zip(r, v) if rv[0][0] <= 0.0])
r1 = np.array(r1)
r2 = np.array(r2)
v1 = np.array(v1)
v2 = np.array(v2)

This works well enough for my purposes, however it involves conversion to large lists of arrays, which is surely quite inefficient.

Is there an alternative solution, which is fast, concise and avoids the creation of intermediate lists?

score 3 · Accepted Answer · answered Sep 08 '11 at 07:45

3

you can use bool array as index to filter out values:

create some random test data first:

import numpy as np
np.random.seed(0)
r = np.random.rand(10,2)-0.5
v = np.random.rand(10) + np.random.rand(10)*1j

then:

idx = r[:,0] > 0 # idx is a bool array 
r1 = r[idx]
v1 = v[idx]

r2 = r[~idx] # ~idx compute bit-wise NOT, element-wise
v2 = v[~idx]

answered Sep 08 '11 at 07:45

HYRY

94,853
25
187
187

very nice, I didn't know you could use a bool array to index stuff – steabert Sep 08 '11 at 08:01

steabert · Answer 2 · 2011-09-08T07:05:24.097

When checking conditions of numpy arrays, I usually end up with using numpy.where, with only a condition as arguments, it returns the indices of the array:

i1 = numpy.where(r[:,0]>0.0) # i1 has now the row indices where column 0 > 0.0
i2 = numpy.where(r[:,0]<=0.0)
r1 = numpy.take(r,i1[0],0)       # take slices of r along axis 0
v1 = numpy.take(v,i1[0],0)
r2 = numpy.take(r,i2[0],0)
v2 = numpy.take(v,i2[0],0)

Somewhat shorter, just use compress in this case, which combines both:

larger = r[:,0]>0.0
r1 = numpy.compress(larger,r,0)

I do not know if this is faster, but it only uses arrays, no intermediate lists

EDIT: you might also want to look at masked arrays if you want to operate on r, v directly

That's a nice answer, I didn't know about the single argument form of "where". But the bool array indexing is even better! — DaveP, Sep 08 '11 at 22:40

How to efficiently split a pair of arrays based on a condition on one of them

2 Answers2