1

Suppose I have a NumPy ndarray M with the following content at M[0,:]:

[2, 3.9, 7, 9, 0, 1, 8.1, 3.2]

and I am given an integer, k, at runtime between 0 and 7. I want to produce the vector consisting of all items in this row except at column k. (Example: if k=3, then the desired vector is [2,3.9,7,0,1,8.1,3.2])

Is there an easy way to do this?

What if I have a vector of indices k, one for each row of M, representing the column I want to exclude from the row?

I'm kind of lost, other than a non-vectorized loop that mutates a result matrix:

nrows = M.shape[0]
result = np.zeros(nrows,M.shape[1]-1))
for irow in xrange(nrows):
    result[irow,:k[irow]] = M[irow,:k[irow]]   # content before the split point
    result[irow,k[irow]:] = M[irow,k[irow]+1:] # content after the split point
Jason S
  • 184,598
  • 164
  • 608
  • 970
  • 1
    [another question is close](https://stackoverflow.com/questions/19286657/index-all-except-one-item-in-python), but it's 1D, and this involves 2D, so it's not an exact duplicate. – Jason S Dec 27 '18 at 20:10
  • [another question that is also 1D](https://stackoverflow.com/questions/7429118/how-to-get-all-the-values-from-a-numpy-array-excluding-a-certain-index)... to those of you marking as possible duplicates, did you read this question carefully? there are so many well-meaning attempts at "improvements" on this site that are a bit hasty. – Jason S Dec 27 '18 at 20:19
  • vaguely related to the inverse problem: https://stackoverflow.com/questions/17074422/select-one-element-in-each-row-of-a-numpy-array-by-column-indices – Jason S Dec 27 '18 at 20:40
  • **Please read this / other questions carefully before marking as a duplicate.** – Jason S Dec 28 '18 at 13:09
  • Please reopen, this is different from the nominated question: https://stackoverflow.com/questions/7429118/how-do-i-get-all-the-values-from-a-numpy-array-excluding-a-certain-index – Jason S Jun 29 '19 at 01:18

2 Answers2

3

One approach would be with masking/boolean-indexing -

mask = np.ones(M.shape,dtype=bool)
mask[np.arange(len(k)),k] = 0
out = M[mask].reshape(len(M),-1)

Alternativley, we could use broadcasting to get that mask -

np.not_equal.outer(k,np.arange(M.shape[1]))
# or k[:,None]!=np.arange(M.shape[1])

Thus, giving us a one-liner/compact version -

out = M[k[:,None]!=np.arange(M.shape[1])].reshape(len(M),-1)

To exclude multiple ones per row, edit the advanced-indexing part for the first method -

def exclude_multiple(M,*klist):
    k = np.stack(klist).T
    mask = np.ones(M.shape,dtype=bool)
    mask[np.arange(len(k))[:,None],k] = 0
    out = M[mask].reshape(len(M),-1)
    return out

Sample run -

In [185]: M = np.arange(40).reshape(4,10)

In [186]: exclude_multiple(M,[1,3,2,0],[4,5,8,1])
Out[186]: 
array([[ 0,  2,  3,  5,  6,  7,  8,  9],
       [10, 11, 12, 14, 16, 17, 18, 19],
       [20, 21, 23, 24, 25, 26, 27, 29],
       [32, 33, 34, 35, 36, 37, 38, 39]])
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Thanks, your numpy-fu far exceeds mine. Which of those methods would you expect to be more performant for large arrays? – Jason S Dec 27 '18 at 20:51
  • 1
    @JasonS I would go with the first one for being memory efficient and hopefully should translate to perf. – Divakar Dec 27 '18 at 20:52
  • any comment about using `mask[xrange(len(k)),k]` vs. `mask[np.arange(len(k)),k]`? – Jason S Dec 27 '18 at 20:53
  • 1
    @JasonS I would use `np.arange`. Arrays work better. – Divakar Dec 27 '18 at 20:53
  • i'd love to read a blog post about all this stuff. hard to predict what's going to happen until you try a bunch of things :/ – Jason S Dec 27 '18 at 20:58
  • @JasonS True, you learn about those mostly by using them and trying out various tools. Can't think of any but the official docs for some info into those. – Divakar Dec 27 '18 at 21:00
  • based on your answer, i just added an improvement to exclude more than one element from each row – Jason S Dec 27 '18 at 21:07
  • 1
    @JasonS Added solution for the multiple ones. – Divakar Dec 27 '18 at 21:33
1

Improvement on @Divakar's answer to extend this to zero or more excluded indices per row:

def excluding(A, *klist):
    """ 
    excludes column k from each row of A, for each k in klist 
    (make sure the index vectors have no common elements)
    """
    mask = np.ones(A.shape,dtype=bool)
    for k in klist:
        mask[np.arange(len(k)),k] = 0
    return A[mask].reshape(len(A),-1)

Test:

M = np.arange(40).reshape(4,10)
excluding(M,[1,3,2,0],[4,5,8,1])

returns

array([[ 0,  2,  3,  5,  6,  7,  8,  9],
       [10, 11, 12, 14, 16, 17, 18, 19],
       [20, 21, 23, 24, 25, 26, 27, 29],
       [32, 33, 34, 35, 36, 37, 38, 39]])
Jason S
  • 184,598
  • 164
  • 608
  • 970