Python np.delete issue

Question

A = np.array([[1,2,3],[3,4,5],[5,6,7]])
X = np.array([[0, 1, 0]])
for i in xrange(np.shape(X)[0]):
    for j in xrange(np.shape(X)[1]):
        if X[i,j] == 0.0:
            A = np.delete(A, (j), axis=0)

I am trying to delete j from A if in X there is 0 at index j. I get

IndexError: index 2 is out of bounds for axis 0 with  size 2.

if X is [[0,1,0]] then A should become [[3,4,5]].

score 3 · Accepted Answer · edited May 23 '17 at 11:43

Don't call np.delete in a loop. It would be quicker to use boolean indexing:

In [6]: A[X.astype(bool).any(axis=0)]
Out[6]: array([[3, 4, 5]])

X.astype(bool) turns 0 into False and any non-zero value into True:

In [9]: X.astype(bool).any(axis=0)
Out[9]: array([False,  True, False], dtype=bool)

the call to .any(axis=0) returns True if any value in a column of X.astype(bool) is True, and False otherwise.

Removing items from a list (or array) while looping over the same list is a classic pitfall. The problem is cause by the fact that removing items from the list changes the meaning of ordinal indexing so that if you use ordinal indexing to remove other items, you may end up removing the wrong items, or get IndexError if you try to index beyond the valid range for the modified list.

In your case, when you loop through rows of A and also delete rows of A:

for j in xrange(np.shape(X)[1]):
    if X[i,j] == 0.0:
        A = np.delete(A, j, axis=0)

each time you modify A the index into A changes. So

        A = np.delete(A, 0, axis=0)

deletes the first row of the original A, but now the new A only has two rows. So

        A = np.delete(A, 2, axis=0)

raises an IndexError since 2 refers to the third row, and the new A does not have a third row. This problem is only exacerbated if the i-loop causes A = np.delete(A, j, axis=0) to be called multiple times too.

Python np.delete issue

1 Answers1