3

What is the fastest method to delete elements from numpy array while retreiving their initial positions. The following code does not return all elements that it should:

list = []
for pos,i in enumerate(ARRAY):
    if i < some_condition:
        list.append(pos)  #This is where the loop fails

for _ in list:
    ARRAY = np.delete(ARRAY, _)
KeVal
  • 351
  • 1
  • 4
  • 15

2 Answers2

5

It really feels like you're going about this inefficiently. You should probably be using more builtin numpy capabilities -- e.g. np.where, or boolean indexing. Using np.delete in a loop like that is going to kill any performance gains you get from using numpy...

For example (with boolean indexing):

keep = np.ones(ARRAY.shape, dtype=bool)
for pos, val in enumerate(ARRAY):
    if val < some_condition:
        keep[pos] = False
ARRAY = ARRAY[keep]

Of course, this could possibly be simplified (and generalized) even further:

ARRAY = ARRAY[ARRAY >= some_condition]

EDIT

You've stated in the comments that you need the same mask to operate on other arrays as well -- That's not a problem. You can keep a handle on the mask and use it for other arrays:

mask = ARRAY >= some_condition
ARRAY = ARRAY[mask]
OTHER_ARRAY = OTHER_ARRAY[mask]
...

Additionally (and perhaps this is the reason your original code isn't working), as soon as you delete the first index from the array in your loop, all of the other items shift one index to the left, so you're not actually deleting the same items that you "tagged" on the initial pass.

As an example, lets say that your original array was [a, b, c, d, e] and on the original pass, you tagged elements at indexes [0, 2] for deletion (a, c)... On the first pass through your delete loop, you'd remove the item at index 0 -- Which would make your array:

[b, c, d, e]

now on the second iteration of your delete loop, you're going to delete the item at index 2 in the new array:

[b, c, e]

But look, instead of removing c like we wanted, we actually removed d! Oh snap!

To fix that, you could probably write your loop over reversed(list), but that still won't result in a fast operation.

mgilson
  • 300,191
  • 65
  • 633
  • 696
  • I was thinking of using while loop but it wont return the positions of ill elements. And I need the positions because some other arrays also need to have elements at those specific places removed (so a graph can be plotted later). – KeVal Jan 21 '16 at 05:11
  • @KeVal -- That's not a problem. Just keep a handle on the mask and use it for the other arrays too. (see my edit). – mgilson Jan 21 '16 at 06:03
2

You don't need to iterate, especially with a simple condition like this. And you don't really need to use delete:

A sample array:

In [693]: x=np.arange(10)

A mask, boolean array were a condition is true (or false):

In [694]: msk = x%2==0
In [695]: msk
Out[695]: array([ True, False,  True, False,  True, False,  True, False,  True, False], dtype=bool)

where (or nonzero) converts it to indexes

In [696]: ind=np.where(msk)
In [697]: ind
Out[697]: (array([0, 2, 4, 6, 8], dtype=int32),)

You use the whole ind in one call to delete (no need to iterate):

In [698]: np.delete(x,ind)
Out[698]: array([1, 3, 5, 7, 9])

You can use it ind to retain those values instead:

In [699]: x[ind]
Out[699]: array([0, 2, 4, 6, 8])

Or you can used the boolean msk directly:

In [700]: x[msk]
Out[700]: array([0, 2, 4, 6, 8])

or use its inverse:

In [701]: x[~msk]
Out[701]: array([1, 3, 5, 7, 9])

delete doesn't do much more than this kind of boolean masking. It's all Python code, so you can easily study it.

hpaulj
  • 221,503
  • 14
  • 230
  • 353