0

I've got a 16000x16000 matrix and I need to extract a boolean mask from it. The boolean mask indicates whether the value at a certain cell is higher than a THRESHOLD or not.

Here's the relevant snippet

def adj_mask( dmat ):
    '''
    Returns locations where value exceeds the threshold
    '''
    global DIST    

    return numpy.where(dmat > DIST)

I have various matrices for which I need to perform such selection. However, my whole system seems to almost freeze (hang) when I call it with a big (16000x16000) matrix.

Please suggest how can I fasten up this computation?

vyi
  • 1,078
  • 1
  • 19
  • 46
  • 1
    You've got 256 *million* values to query in a matrix of that size... some suggestions [here](http://stackoverflow.com/questions/14351255/techniques-for-working-with-large-numpy-arrays) and [here](http://stackoverflow.com/questions/1053928/very-large-matrices-using-python-and-numpy) – David Zemens Mar 14 '16 at 17:43
  • 4
    Are you sure you need `np.where`? Where are you using this result? Would `dmat > DIST` without `np.where` be sufficient? – Eric Mar 14 '16 at 17:52
  • 2
    Is it really necessary to generate the whole 16000x16000 matrix? If it's a distance matrix then it will be symmetric, so at most you would only need to store/search the upper or lower triangle. Could you explain in more detail what you're trying to achieve? I suspect you might be much better off using a [`scipy.spatial.cKDTree`](https://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.spatial.cKDTree.query.html) instead. – ali_m Mar 14 '16 at 19:20

0 Answers0