2

I have a Numpy Array that with integer values 1 or 0 (can be cast as booleans if necessary). The array is square and symmetric (see note below) and I want a list of the indices where a 1 appears:

Note that array[i][j] == array[j][i] and array[i][i] == 0 by design. Also I cannot have any duplicates.

import numpy as np
array = np.array([
    [0, 0, 1, 0, 1, 0, 1],
    [0, 0, 1, 1, 0, 1, 0],
    [1, 1, 0, 0, 0, 0, 1],
    [0, 1, 0, 0, 1, 1, 0],
    [1, 0, 0, 1, 0, 0, 1],
    [0, 1, 0, 1, 0, 0, 0],
    [1, 0, 1, 0, 1, 0, 0]
])

I would like a result that is like this (order of each sub-list is not important, nor is the order of each element within the sub-list):

[
    [0, 2], 
    [0, 4], 
    [0, 6], 
    [1, 2], 
    [1, 3],
    [1, 5],
    [2, 6],
    [3, 4],
    [3, 5],
    [4, 6]
]

Another point to make is that I would prefer not to loop over all indices twice using the condition j<i because the size of my array can be large but I am aware that this is a possibility - I have written an example of this using two for loops:

result = []
for i in range(array.shape[0]):
    for j in range(i):
        if array[i][j]:
            result.append([i, j])
print(pd.DataFrame(result).sort_values(1).values)


# using dataframes and arrays for formatting but looking for
# 'result' which is a list

# Returns (same as above but columns are the opposite way round):
[[2 0]
 [4 0]
 [6 0]
 [2 1]
 [3 1]
 [5 1]
 [6 2]
 [4 3]
 [5 3]
 [6 4]]
d-man
  • 476
  • 4
  • 24

3 Answers3

4
idx = np.argwhere(array)
idx = idx[idx[:,0]<idx[:,1]]

Another way:

idx = np.argwhere(np.triu(array))

output:

[[0 2]
 [0 4]
 [0 6]
 [1 2]
 [1 3]
 [1 5]
 [2 6]
 [3 4]
 [3 5]
 [4 6]]

Comparison:

#@bousof solution
def method1(array):
  return np.vstack(np.where(np.logical_and(array, np.diff(np.ogrid[:array.shape[0],:array.shape[0]])[0]>=0))).transpose()[:,::-1]

#Also mentioned by @hpaulj
def method2(array):
  return np.argwhere(np.triu(array))

def method3(array):
  idx = np.argwhere(array)
  return idx[idx[:,0]<idx[:,1]]

#The original method in question by OP(d-man)
def method4(array):
  result = []
  for i in range(array.shape[0]):
      for j in range(i):
          if array[i][j]:
              result.append([i, j])
  return result

#suggestd by @bousof in comments
def method5(array):
  return np.vstack(np.where(np.triu(array))).transpose()

inputs = [np.random.randint(0,2,(n,n)) for n in [10,100,1000,10000]]

Seems like method1, method2 and method5 are slightly faster for large arrays while method3 is faster for smaller cases:

enter image description here

Ehsan
  • 12,072
  • 2
  • 20
  • 33
  • @Hammad Thank you for catching that. Edited the post. – Ehsan Jul 12 '20 at 20:04
  • 1
    This answer looks great - please can you put the original method I put in for a further comparison please - then I will accept the answer – d-man Jul 16 '20 at 13:26
  • 1
    +1 Thanks for comparison. You'll probably get the best performance if you mix methods 1 and 2. You'll get a x3 speedup if you replace `np.argwhere(np.triu(array))` by `np.vstack(np.where(np.triu(array))).transpose()`. I think it is due to the `numpy.where` function being written in C not Python. See: https://stackoverflow.com/questions/54499496/where-can-i-find-numpy-where-source-code – bousof Jul 16 '20 at 15:20
  • @d-man Added your original solution for comparison. Glad it helped :) – Ehsan Jul 16 '20 at 20:46
  • @bousof Thank you. I added your proposed solution in comment. Does not seem like to improve the performance by much (They are practically the same performance). Maybe in a different system setting, it becomes more efficient. `np.where` and `np.argwhere` should not be that different. – Ehsan Jul 16 '20 at 20:47
  • 1
    Yes I'm surprised get a factor 3 difference on my laptop. Thank you very much for reactivity! I'll investigate what is happening on my PC. – bousof Jul 16 '20 at 21:08
  • 1
    thanks to all involved - wouldn't have thought there would be so many ways! – d-man Jul 16 '20 at 21:59
3
In [249]: arr = np.array([ 
     ...:     [0, 0, 1, 0, 1, 0, 1], 
     ...:     [0, 0, 1, 1, 0, 1, 0], 
     ...:     [1, 1, 0, 0, 0, 0, 1], 
     ...:     [0, 1, 0, 0, 1, 1, 0], 
     ...:     [1, 0, 0, 1, 0, 0, 1], 
     ...:     [0, 1, 0, 1, 0, 0, 0], 
     ...:     [1, 0, 1, 0, 1, 0, 0] 
     ...: ])   

The most common way of getting indices on non-zeros (True) is with np.nonzero (aka np.where):

In [250]: idx = np.nonzero(arr)                                                                      
In [251]: idx                                                                                        
Out[251]: 
(array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 6, 6, 6]),
 array([2, 4, 6, 2, 3, 5, 0, 1, 6, 1, 4, 5, 0, 3, 6, 1, 3, 0, 2, 4]))

This is a tuple - 2 arrays for a 2d array. It can be used directly to index the array (or anything like it): arr[idx] will give all 1s.

Apply np.transpose to that and get an array of 'pairs':

In [252]: np.argwhere(arr)                                                                           
Out[252]: 
array([[0, 2],
       [0, 4],
       [0, 6],
       [1, 2],
       [1, 3],
       [1, 5],
       [2, 0],
       [2, 1],
       [2, 6],
       [3, 1],
       [3, 4],
       [3, 5],
       [4, 0],
       [4, 3],
       [4, 6],
       [5, 1],
       [5, 3],
       [6, 0],
       [6, 2],
       [6, 4]])

Using such an array to index arr is harder - requiring a loop and conversion to tuple.

To weed out the symmetric duplicates we could make a tri-lower array:

In [253]: np.tril(arr)                                                                               
Out[253]: 
array([[0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [1, 1, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0],
       [1, 0, 0, 1, 0, 0, 0],
       [0, 1, 0, 1, 0, 0, 0],
       [1, 0, 1, 0, 1, 0, 0]])

In [254]: np.argwhere(np.tril(arr))                                                                  
Out[254]: 
array([[2, 0],
       [2, 1],
       [3, 1],
       [4, 0],
       [4, 3],
       [5, 1],
       [5, 3],
       [6, 0],
       [6, 2],
       [6, 4]])
     

     
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • +1 For the `tril` method. I was wondering how my solution can be faster than yours and I found that you'll get a x3 speedup f you use `np.vstack(np.where(np.tril(arr))).transpose()` instead of `np.argwhere(np.tril(arr))`. I think it is due to the `numpy.where` function being written in C not Python. See: https://stackoverflow.com/questions/54499496/where-can-i-find-numpy-where-source-code – bousof Jul 16 '20 at 14:39
1

You can use numpy.where:

>>> np.vstack(np.where(np.logical_and(array, np.diff(np.ogrid[:array.shape[0],:array.shape[0]])[0]<=0))).transpose()
array([[2, 0],
       [2, 1],
       [3, 1],
       [4, 0],
       [4, 3],
       [5, 1],
       [5, 3],
       [6, 0],
       [6, 2],
       [6, 4]])

np.diff(np.ogrid[:array.shape[0],:array.shape[0]])[0]<=0 is true only on the lower part of the matrix. If the order is important, you can get the same order as in the question using:

>>> np.vstack(np.where(np.logical_and(array, np.diff(np.ogrid[:array.shape[0],:array.shape[0]])[0]>=0))).transpose()[:,::-1]
array([[2, 0],
       [4, 0],
       [6, 0],
       [2, 1],
       [3, 1],
       [5, 1],
       [6, 2],
       [4, 3],
       [5, 3],
       [6, 4]])
bousof
  • 1,241
  • 11
  • 13
  • I can't have duplicates e.g. `[6, 4]` and `[4, 6]` count as duplicates (question previously didn't mention this, now it does) – d-man Jul 10 '20 at 18:13
  • thanks, please can you edit for any generic size of the input array - it won't always be 7 in length – d-man Jul 10 '20 at 18:39